[ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, Victim & Survivor of gmkarl at gmail.com
Sun Jan 30 05:54:45 PST 2022


I've realised the return_attentions addition I attempted to make to
memory-efficient-transformers may have actually completely countered
the memory savings of the research paper, by allocating a matrix sized
by queries x keys for the entirety of the execution. If true, then my
pull request could be confusing and harmful to the developer.

I should re-review the paper to understand how much memory is saved,
and whether or not my feature is appropriate in the algorithm. If not,
it would simply be disabled in the transformers library if chunking is
used.


More information about the cypherpunks mailing list