[ot][spam][crazy][data] transformer model 'attention' improvement

Undiscussed Horrific Abuse, One Victim & Survivor of Many gmkarl at gmail.com
Wed Feb 2 01:58:18 PST 2022


so, torch tensors are views, but jax tensors are copies.

- my current work was torch only so it is << O(n^2) if and only if the
passed matrices are not full and dense
- the jax code in memorty-efficient-attention has a bug, it can't be
<<O(n^2) if a mask or bias is passed

I already drafted a fix for memory-efficient-attention before
questioning if it was needed, so I'll see if I can test and contribute
it.


More information about the cypherpunks mailing list