Re: [ot][spam][crazy][data] transformer model 'attention' improvement

1 Feb 2022


      re the masks and biases, basically the chunking code assumes they are
dense matrices, but by changing the chunking code you can pass only
the data needed. i'm presently doing that. it may end up that the
optimization is not reasonable on models that store a dense mask or
bias as an on-disk weight.

Re: [ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, One Victim & Survivor of Many