[ot][spam][crazy][data] transformer model 'attention' improvement
k
gmkarl at gmail.com
Tue Jan 25 03:08:04 PST 2022
08: key_chunk_size = min(key_chunk_size, num_kv)
It's the first dimension of the keys and values that will be split.
09: query = query / jnp.sqrt(k_features)
# i typed a lot of comments on lines but they disappeared again. i
plan to return to line 09 above because i'm not sure why it is. i
skipped the inner functions to start with, and am working on copying
over lines 30 and 31. sending to preserve.
More information about the cypherpunks
mailing list