[ot][spam][crazy][data] transformer model 'attention' improvement

k gmkarl at gmail.com
Wed Jan 26 01:19:45 PST 2022


So basically a matmul is an einsum that drops the last coordinate of
the first operand and the second to last coordinate of the second
operand. Matrix multiplications really are sequences of dot products!
Linear algebra is slowly and painfully coming back to me.

Attached is a transcription of huggingface's perceiver attention that
works with the same example data. The 'keys/queries/values' axis ends
up being the sequence axis.  They permute the matrices to exclude the
heads dimension so the dot products can be done with a normal matmul
rather than einsum and its string parsing.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hfperceiver_attn.py
Type: text/x-python
Size: 1162 bytes
Desc: not available
URL: <https://lists.cpunks.org/pipermail/cypherpunks/attachments/20220126/a91582b7/attachment.py>


More information about the cypherpunks mailing list