[spam] [personal] perceiver model notes

Karl Semich gmkarl at gmail.com
Thu Jan 27 22:43:41 PST 2022


I guess I'd better test using this in some way before opening a pull
request, which would likely mean code in transformers that uses it. I
was thinking of adding it to the gpt-j model instead of gpt2. It's
more useful and the attention code actually appears much simpler.

https://github.com/AminRezaei0x443/memory-efficient-attention/compare/main...xloem:faba6371ac7faaa2040a2c26e15ae7ab87f94ce4

commit faba6371ac7faaa2040a2c26e15ae7ab87f94ce4 (HEAD ->
return_weights, origin/return_weights)
Date:   Fri Jan 28 06:37:41 2022 +0000

    mostly normalised return_attentions implementations between
backends, tests pass


More information about the cypherpunks mailing list