I bundled up some inputs (poorly), and extended the tokenizers to process symbols found in code, and continued with the same adapter even though those things had changed.
The loss isn't dropping very fast anymore. There's also a bug I'm running into, where the code I did not write is just hanging and needs a reboot.
I've been considering blaming the tokenization changes, which might be resolved by enabling fine-tuning for the embedding layer of the model. This would increase performance, too. Not sure whether the systems support that.
It could also be the bundling of the input. And if there is a mistake in data generation, that could do it too. It could also be changing the data format on it midway.
Before I started messing with it, the loss had hit .6 or .7 and seemed reasonable for it to drop more. Been hanging at 1.4 now. I feel like it's the new tokens, or how I did them, but I don't know for sure.
Anyway I'm having trouble guiding my body etc around it this afternoon, so I'm likely to just let it keep chugging.