[ot][spam][crazy] adapters for semibalanced trees?

Tue Jul 26 10:21:59 PDT 2022

i burned some time figuring out the longest number of patch tokens i
could generate with the present config on colab, which let me have a
gpu again today. the number was 2168.

given colab gives me bigger gpus, it seemed to make sense to invest a
little bit in figuring out how to use them better before they time out
for the day.

i've enabled embeddings for training, after figuring out the 2168
number, and i'm a little stuck on storage.

the embeddings need to be paired with their trained tokenizer, but
it's not being uploaded with the model.