[ot][spam][crazy][data] transformer model 'attention' improvement

k gmkarl at gmail.com
Tue Jan 25 03:08:04 PST 2022


08:  key_chunk_size = min(key_chunk_size, num_kv)

It's the first dimension of the keys and values that will be split.

09:  query = query / jnp.sqrt(k_features)

# i typed a lot of comments on lines but they disappeared again.  i
plan to return to line 09 above because i'm not sure why it is.  i
skipped the inner functions to start with, and am working on copying
over lines 30 and 31.  sending to preserve.


More information about the cypherpunks mailing list