Re: [ot][spam][crazy][data] transformer model 'attention' improvement

2 Feb 2022

      - in perceiver, the user-provided attention vector is expanded with
1-length dimensions and passed on.

so perceiver has an O(n) attention mask. i didn't note a
model-associated bias. my code generates a bias to accommodate feature
matching between the two codebases, which will need an improvement if
kept.

it's 10:46 UTC.  gptj next.

Re: [ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, One Victim & Survivor of Many