[ot][spam][crazy] crazylogs: STaR

Tue Jul 5 16:34:18 PDT 2022

the adapter transformers training documentation is at
https://docs.adapterhub.ml/training.html . it's sparse. basically, you
train it as if it's a huggingface model, except you add two lines to
put an adapter in.

then, in theory, it does the same thing but uses much less ram and
happens much faster. dunno.

the paper i'm trying to copy here, STaR, did not use a huggingface
model. so there's more for me to figure out on my own. if i consider
it, though, models are pretty much trained all the same way:
researchers put their data in formats that are relatively normative,
then write scripts to load these formats and run them through models.
i haven't fully considered this yet, busy learning about adapter
transformers.