these are my incomplete model permutation notes, for inside the attention implementations. each axis is labeled with an einsum letter. chunked: queries: ...qhd keys: ...khd values: ...vhd mask: ...hqk -> ...qhk scores: ...qhk unchunked: queries: .hqd keys: .hkd values: .hvd mask: . scores: .