26 Jan
2022
26 Jan
'22
9:19 a.m.
So basically a matmul is an einsum that drops the last coordinate of the first operand and the second to last coordinate of the second operand. Matrix multiplications really are sequences of dot products! Linear algebra is slowly and painfully coming back to me. Attached is a transcription of huggingface's perceiver attention that works with the same example data. The 'keys/queries/values' axis ends up being the sequence axis. They permute the matrices to exclude the heads dimension so the dot products can be done with a normal matmul rather than einsum and its string parsing.