so basically codefudge is some scripts to train an adapter based on history of a git repository. hist.bash breaks the git history into *.file files containing commit message and file content, and *.commit files that contain the file diff within the commit. at time of writing, each file changed in the commit is a separate pair, because i was low on gpu ram. that system could be improved. hist2json simply converts those files into data in the format the huggingface training scripts take. the *.file files are the input, and the *.commit (diff data) files are the output the model is trained on. so, the model would possibly learn to produce file changes that match pairs of commit messages and file contents. i tried this briefly and my loss got to around 1.5 or so (from 9 or 10) within a few hours on google colab [before the session was terminated and gpu access disabled for it, despite my purchase of their $50 high-end plan, which apparently gives me the same 16GB gpu as the public plan atm; they say it depends on usage]