translation to amnesiatic english: On 7/5/22, Undiscussed Horrific Abuse, One Victim of Many <gmkarl@gmail.com> wrote:
ok let's try to implement STaR a little tiny smidge
by STaR I mean this paper: https://arxiv.org/abs/2203.14465 it is a way to use a generative language model to make logical common sense decisions that are accurate when transferred to new domains, _without_ producing a ton of data or performing a ton of model training.
basically, you need to be able to finetune or train a language model. this means paying a few dollars for time, having a powerful GPU,
time can be purchased on google cloud TPUs (this might mesh well with the paper; the model they used was made by people who used TPUs), or vast.ai, or many other places. it is common for language model services to provide finetuning as a paid service (maybe between 20% and 60% of services i've found provide this). a powerful gpu means a lot of VRAM. the lowest end is the tesla K80. higher end gpus start with the letter A and then have a big number after them. nvidia has dominated this for a while but other corporations are stepping up. you can run gpus in parallel if they don't have enough gpu ram or speed, but this does mean learning more code or systems to interface with them.
compromising with a very small model, or implementing research algorithms.
i commonly daydream of research since i have trouble directing my body, so i have a lot of ways in my head to improve on things that are very hard for me to try out. i haven't seen much research, but i get the impression there is a lot of stuff out there that simply needs to be combined together across domains. a _lot_ of stuff. part of it may get pretty obvious if you look inside the source code of machine learning libraries: many things to me have seemed unoptimized. often huge popular papers are simply performing an algebraic operation to reduce system load, like everybody was doing 40 years ago to make anything run at all.
my plan is to compromise with a very small model. with considering paying a few dollars to train on somebody else's server.
using a very small model means it won't be able to hold as many concepts in parallel, or as complex concepts, since it won't have as many places to store separate information. so things work if the domain is made small.