28 Jan
2022
28 Jan
'22
6:43 a.m.
I guess I'd better test using this in some way before opening a pull request, which would likely mean code in transformers that uses it. I was thinking of adding it to the gpt-j model instead of gpt2. It's more useful and the attention code actually appears much simpler. https://github.com/AminRezaei0x443/memory-efficient-attention/compare/main..... commit faba6371ac7faaa2040a2c26e15ae7ab87f94ce4 (HEAD -> return_weights, origin/return_weights) Date: Fri Jan 28 06:37:41 2022 +0000 mostly normalised return_attentions implementations between backends, tests pass