30 Jan
2022
30 Jan
'22
6:20 p.m.
I'm pretty sure the "n" in time * memory = O(n^2) relates to the key/query count, which are the same in self-attention. The batch dimension is used by the user to set memory bounds.