[spam] [personal] perceiver model notes
Undiscussed Horrific Abuse, One Victim & Survivor of
gmkarl at gmail.com
Thu Jan 27 03:47:33 PST 2022
the data comes out right now until it's consolidated at the end of the softmax
i stepped through it carefully, and it turns out the attention values
are being generated in a truncated manner. there are only 20 in the
efficient_attention code, whereas there are 96 in the working code.
so, i still got something wrong. i'm guessing my test passed more
easily because it had the same feature size for all of queries, keys,
and values. that is not true in the perceiver_loader test i'm
pursuing; i think it looks as if the values have a feature size of 20
whereas the keys have a feature size of 96. gotta review again to get
that making 96 attention scores instead of 20, I guess. unsure.
here are the notes with the einsum letters flushed out, unchecked:
chunked:
queries: ...qhd
keys: ...khd
values: ...vhd
scores: ...qhk
mask: ...hqk -> ...qhk
unchunked:
queries: .hqd
keys: .hkd
values: .hvd
scores: .hqk -> 1,8,256,96
mask: .hqk -> 1,1,1,96 -> needs extension to
num_heads, num_queries
commit eb16dc63d2c617bfe708881f8bd5ba96be8b9f50 (HEAD ->
memory-efficient-attention, xloem/memory-efficient-attention)
Author: xloem <0xloem at gmail.com>
Date: Thu Jan 27 11:43:45 2022 +0000
wip efficient attention: dimensions pass but data is truncated
More information about the cypherpunks
mailing list