[ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, One Victim & Survivor of Many gmkarl at gmail.com
Wed Feb 2 02:48:29 PST 2022


- in perceiver, the user-provided attention vector is expanded with
1-length dimensions and passed on.

so perceiver has an O(n) attention mask. i didn't note a
model-associated bias. my code generates a bias to accommodate feature
matching between the two codebases, which will need an improvement if
kept.

it's 10:46 UTC.  gptj next.


More information about the cypherpunks mailing list