an extant concern is this: Loading commits for ../repos/NWScript/ commit: a7278603f354eb683296f443e49621e50d35fcf0 looping over diff: a7278603f354eb683296f443e49621e50d35fcf0 process2: /usr/include/rapidjson/internal/stack.h:129: T* rapidjson::internal::Stack<Allocator>::PushUnsafe(std::size_t) [with T = char; Allocator = rapidjson::CrtAllocator; std::size_t = long unsigned int]: Assertion `stackTop_ + sizeof(T) * count <= stackEnd_' failed. xargs: ./process2: terminated by signal 6 I got the error just trying to run the code on all the data some hours ago. When I reran the code on only that repository, it enumerated all the commits, despite the trees being very very large, and did not crash at all. I'm currently running it in valgrind to see if there's anything to catch. This is my test commandline: shuf --echo ../repos/*/ --zero-terminated | xargs -0 ./process2 | tee test.json | cut -c1-160 I use zero termination because some of the repo paths (each one is named after a code language, and collects many remotes together written in that language) have spaces in them, which xargs doesn't handle well. It would be helpful to get shuf behaving deterministically. A simple approach would be to cache its output in a file.