this has been going slower than needed because colab was bailing when i tried to run the model on google's tpus, during compilation. today i made a google cloud vm and precompiled the model in their shell, and addded precompilation support to the notebook. it was _really_ hard to make the vm, my psychosis kept having me forget i was doing it and do something else, over and over and over again. but now, using the tpus it is much faster, days turn into minutes. i haven't poked around with it much yet. i also found there is a series of t5 models pretrained on individual bytes instead of tokens: https://huggingface.co/docs/transformers/model_doc/byt5 exciting developments. big next steps: - have the code gather much much more data. - try a bigger model that can learn more complex things. i've been running the model on their shell for maybe an hour and the loss is down to 0.5 or so. they charge by the hour so i should really turn it off. i'm using the lowest-end tpus so that it models how the notebooks should perform after i terminate the vm.