[ot][spam][crazy] crazylogs: STaR

Tue Jul 5 13:13:04 PDT 2022

translation to amnesiatic english:

On 7/5/22, Undiscussed Horrific Abuse, One Victim of Many
<gmkarl at gmail.com> wrote:
> ok let's try to implement STaR a little tiny smidge

by STaR I mean this paper: https://arxiv.org/abs/2203.14465

it is a way to use a generative language model to make logical common
sense decisions that are accurate when transferred to new domains,
_without_ producing a ton of data or performing a ton of model
training.

> basically, you need to be able to finetune or train a language model.
> this means paying a few dollars for time, having a powerful GPU,

time can be purchased on google cloud TPUs (this might mesh well with
the paper; the model they used was made by people who used TPUs), or
vast.ai, or many other places. it is common for language model
services to provide finetuning as a paid service (maybe between 20%
and 60% of services i've found provide this).

a powerful gpu means a lot of VRAM. the lowest end is the tesla K80.
higher end gpus start with the letter A and then have a big number
after them. nvidia has dominated this for a while but other
corporations are stepping up. you can run gpus in parallel if they
don't have enough gpu ram or speed, but this does mean learning more
code or systems to interface with them.

> compromising with a very small model, or implementing research
> algorithms.

i commonly daydream of research since i have trouble directing my
body, so i have a lot of ways in my head to improve on things that are
very hard for me to try out. i haven't seen much research, but i get
the impression there is a lot of stuff out there that simply needs to
be combined together across domains. a _lot_ of stuff. part of it may
get pretty obvious if you look inside the source code of machine
learning libraries: many things to me have seemed unoptimized. often
huge popular papers are simply performing an algebraic operation to
reduce system load, like everybody was doing 40 years ago to make
anything run at all.

> my plan is to compromise with a very small model. with considering
> paying a few dollars to train on somebody else's server.

using a very small model means it won't be able to hold as many
concepts in parallel, or as complex concepts, since it won't have as
many places to store separate information. so things work if the
domain is made small.