
2 Feb
2022
2 Feb
'22
10:48 a.m.
- in perceiver, the user-provided attention vector is expanded with 1-length dimensions and passed on. so perceiver has an O(n) attention mask. i didn't note a model-associated bias. my code generates a bias to accommodate feature matching between the two codebases, which will need an improvement if kept. it's 10:46 UTC. gptj next.