doing some work on getting the current state of transformers code i have working again with my test perceiver model that converts numbers. these are my notes on the correct dimension shapes, for the code that ostensibly worked. i plan to compare these with the broken commit below them. correct context layer shape is 1,256,8,20 attention_probs.shape = 1,8,256,96 values.shape = 1,8,96,20 queries.shape = 1,8,256,32 keys.shape = 1,8,96,32 commit ca60cd579c82191d4e6696534af32e96b850015e (HEAD -> memory-efficient-attention, xloem/memory-efficient-attention) Author: xloem <0xloem@gmail.com> Date: Sun Jan 30 23:17:41 2022 +0000 commented out old perceiver code and drafted a call of the new attentions function that does both chunked and nonchunked. currently crashes due to dimension error.