[ot][spam][crazy][data] transformer model 'attention' improvement

k gmkarl at gmail.com
Tue Jan 25 12:31:20 PST 2022


I've got my local code working with the mgrid data.  I made at least
two bugs: an incorrect einsum, and dotting with the weights rather
than the exponent.  The mgrid data doesn't test the softmax since each
vector has the same maximum.  Time to figure out how to make random
tensors in jax.


More information about the cypherpunks mailing list