Re: [ot][spam][crazy][data] transformer model 'attention' improvement

26 Jan 2022


      The first issue I have working with PerceiverSelfAttention is sorting
out the huggingface permutations of the query, key, value matrices.
The dot products aren't making the same weights, indicating I'm not
providing the data in the right shape.  They reorganise the matrices
to handle multiple channels, and split into heads a certain way.  I
have trouble intuiting the relation between torch.matmul and einsum,
regarding a matrix of dot products of feature vectors.