[ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, One Victim & Survivor of Many gmkarl at gmail.com
Tue Feb 1 13:18:41 PST 2022

Previous message (by thread): [ot][spam][crazy][data] transformer model 'attention' improvement
Next message (by thread): [ot][spam][crazy][data] transformer model 'attention' improvement
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

re the masks and biases, basically the chunking code assumes they are
dense matrices, but by changing the chunking code you can pass only
the data needed. i'm presently doing that. it may end up that the
optimization is not reasonable on models that store a dense mask or
bias as an on-disk weight.

Previous message (by thread): [ot][spam][crazy][data] transformer model 'attention' improvement
Next message (by thread): [ot][spam][crazy][data] transformer model 'attention' improvement
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the cypherpunks mailing list