I bundled up some inputs (poorly), and extended the tokenizers to
   process symbols found in code, and continued with the same adapter even
   though those things had changed.
   The loss isn't dropping very fast anymore. There's also a bug I'm
   running into, where the code I did not write is just hanging and needs
   a reboot.
   I've been considering blaming the tokenization changes, which might be
   resolved by enabling fine-tuning for the embedding layer of the model.
   This would increase performance, too. Not sure whether the systems
   support that.
   It could also be the bundling of the input. And if there is a mistake
   in data generation, that could do it too. It could also be changing the
   data format on it midway.
   Before I started messing with it, the loss had hit .6 or .7 and seemed
   reasonable for it to drop more. Been hanging at 1.4 now. I feel like
   it's the new tokens, or how I did them, but I don't know for sure.
   Anyway I'm having trouble guiding my body etc around it this afternoon,
   so I'm likely to just let it keep chugging.