[ot][spam][crazy][data] transformer model 'attention' improvement
Undiscussed Horrific Abuse, One Victim & Survivor of Many
gmkarl at gmail.com
Wed Feb 2 02:48:29 PST 2022
- in perceiver, the user-provided attention vector is expanded with
1-length dimensions and passed on.
so perceiver has an O(n) attention mask. i didn't note a
model-associated bias. my code generates a bias to accommodate feature
matching between the two codebases, which will need an improvement if
kept.
it's 10:46 UTC. gptj next.
More information about the cypherpunks
mailing list