[ot][spam][crazy][personal] watching a progress meter
today i started my first translation model training. i'd guess there's something new that's better than t5, but i don't know what. it will finish in only 3 weeks! so i am sitting and watching it. a running joke is that people spend their time watching models train, rather than working with the insides of the math and the algorithms to speed the training. dunno, maybe they do. seems like things are pretty unoptimized for that to be the case, though. when training, there's something called 'loss' which is like the distance between the model's guess and the real thing. it's supposed to steadily reduce. the loss of the setup i have starts around 9 or 10 and quickly drops to 4 or 5 then drops to 3.3 or so slowly and then bounces around while dropping much more slowly. at the start. i don't really know what that means, but i'm guessing my learning rate is too high and my batch size too small, cause of the jittering implying that how it updates from the derivatives isn't always improving things for the next batch. yup yup yup. i also have things in my life i could do instead of watching this. that's kind of confusing for me :S i recently had some cognitive stuff that changes things for me.
participants (1)
-
k