1 Feb
2022
1 Feb
'22
9:18 p.m.
re the masks and biases, basically the chunking code assumes they are dense matrices, but by changing the chunking code you can pass only the data needed. i'm presently doing that. it may end up that the optimization is not reasonable on models that store a dense mask or bias as an on-disk weight.