[ot][spam][crazy][data] transformer model 'attention' improvement

k gmkarl at gmail.com
Tue Jan 25 16:04:43 PST 2022


The first issue I have working with PerceiverSelfAttention is sorting
out the huggingface permutations of the query, key, value matrices.
The dot products aren't making the same weights, indicating I'm not
providing the data in the right shape.  They reorganise the matrices
to handle multiple channels, and split into heads a certain way.  I
have trouble intuiting the relation between torch.matmul and einsum,
regarding a matrix of dot products of feature vectors.


More information about the cypherpunks mailing list