[ot][spam][crazy][personal] watching a progress meter

k gmkarl at gmail.com
Sun Jan 16 13:04:20 PST 2022


today i started my first translation model training.
i'd guess there's something new that's better than t5, but i don't know what.

it will finish in only 3 weeks!

so i am sitting and watching it.

a running joke is that people spend their time watching models train,
rather than working with the insides of the math and the algorithms to
speed the training.  dunno, maybe they do.  seems like things are
pretty unoptimized for that to be the case, though.

when training, there's something called 'loss' which is like the
distance between the model's guess and the real thing.  it's supposed
to steadily reduce.  the loss of the setup i have starts around 9 or
10 and quickly drops to 4 or 5 then drops to 3.3 or so slowly and then
bounces around while dropping much more slowly.  at the start.

i don't really know what that means, but i'm guessing my learning rate
is too high and my batch size too small, cause of the jittering
implying that how it updates from the derivatives isn't always
improving things for the next batch.

yup yup yup.

i also have things in my life i could do instead of watching this.
that's kind of confusing for me :S i recently had some cognitive stuff
that changes things for me.


More information about the cypherpunks mailing list