[ot][spam][crazy][data] transformer model 'attention' improvement
Undiscussed Horrific Abuse, Victim & Survivor of
gmkarl at gmail.com
Sun Jan 30 15:21:20 PST 2022
doing some work on getting the current state of transformers code i
have working again with my test perceiver model that converts numbers.
these are my notes on the correct dimension shapes, for the code that
ostensibly worked. i plan to compare these with the broken commit
below them.
correct context layer shape is 1,256,8,20
attention_probs.shape = 1,8,256,96
values.shape = 1,8,96,20
queries.shape = 1,8,256,32
keys.shape = 1,8,96,32
commit ca60cd579c82191d4e6696534af32e96b850015e (HEAD ->
memory-efficient-attention, xloem/memory-efficient-attention)
Author: xloem <0xloem at gmail.com>
Date: Sun Jan 30 23:17:41 2022 +0000
commented out old perceiver code and drafted a call of the new
attentions function that does both chunked and nonchunked. currently
crashes due to dimension error.
More information about the cypherpunks
mailing list