[ot][spam][crazy][data] transformer model 'attention' improvement
Undiscussed Horrific Abuse, One Victim & Survivor of Many
gmkarl at gmail.com
Wed Feb 2 02:54:25 PST 2022
- gptj uses a pregenerated constant causal mask that is O(n^2). since
it is simply a constant function of sequence index it could be made
via a callback or inside a loop.
More information about the cypherpunks
mailing list