26 Jan
2022
26 Jan
'22
12:04 a.m.
The first issue I have working with PerceiverSelfAttention is sorting out the huggingface permutations of the query, key, value matrices. The dot products aren't making the same weights, indicating I'm not providing the data in the right shape. They reorganise the matrices to handle multiple channels, and split into heads a certain way. I have trouble intuiting the relation between torch.matmul and einsum, regarding a matrix of dot products of feature vectors.