21 Oct
2024
21 Oct
'24
4:30 a.m.
so with petals down, i was wondering, how would i finetune llama 405b without petals? basically, i would do it very slowly :s i was thinking of getting into the nitty gritty of backpropagation graphs and doing it layer by layer. for example, if you have to offload every layer, you could update one layer's weights, at the same time as forward passing the next batch. this would double the speed.