26 Jan
2022
26 Jan
'22
11:20 p.m.
these are my incomplete model permutation notes, for inside the attention implementations. each axis is labeled with an einsum letter. chunked: queries: ...qhd keys: ...khd values: ...vhd mask: ...hqk -> ...qhk scores: ...qhk unchunked: queries: .hqd keys: .hkd values: .hvd mask: . scores: .