note: - additionally, the perceiver model structure may not need tokenization - and, google made a new T5 called LongT5 that can handle much larger data already, code usually released in coming months given many functions are short, i might skip the length problem for now but maybe now something is training and looks to have some success (and be improveable with management of embedded strings) it oculd make sense to: - collect data for other languages - organise code better - implement reduced memory usage for faster training - address string encoding for faster training - improve the training itself it will be clearer for me after seeing results of the current training. it's helpful to kind of look at results. oh here we go: it needs to save the model for continued training if interrupted, and for use after training. that's important since colab could halt during this test.