[ot][spam][crazy][data] transformer model 'attention' improvement
Undiscussed Horrific Abuse, One Victim & Survivor of Many
gmkarl at gmail.com
Wed Feb 2 01:58:18 PST 2022
so, torch tensors are views, but jax tensors are copies.
- my current work was torch only so it is << O(n^2) if and only if the
passed matrices are not full and dense
- the jax code in memorty-efficient-attention has a bug, it can't be
<<O(n^2) if a mask or bias is passed
I already drafted a fix for memory-efficient-attention before
questioning if it was needed, so I'll see if I can test and contribute
it.
More information about the cypherpunks
mailing list