[spam] [personal] perceiver model notes

Undiscussed Horrific Abuse, One Victim & Survivor of gmkarl at gmail.com
Thu Jan 27 03:47:33 PST 2022


the data comes out right now until it's consolidated at the end of the softmax

i stepped through it carefully, and it turns out the attention values
are being generated in a truncated manner.  there are only 20 in the
efficient_attention code, whereas there are 96 in the working code.

so, i still got something wrong. i'm guessing my test passed more
easily because it had the same feature size for all of queries, keys,
and values. that is not true in the perceiver_loader test i'm
pursuing; i think it looks as if the values have a feature size of 20
whereas the keys have a feature size of 96. gotta review again to get
that making 96 attention scores instead of 20, I guess. unsure.

here are the notes with the einsum letters flushed out, unchecked:

      chunked:
                queries: ...qhd
                keys: ...khd
                values: ...vhd
                scores: ...qhk
                mask: ...hqk -> ...qhk
        unchunked:
                queries: .hqd
                keys: .hkd
                values: .hvd
                scores: .hqk -> 1,8,256,96
                mask: .hqk -> 1,1,1,96 -> needs extension to
num_heads, num_queries


commit eb16dc63d2c617bfe708881f8bd5ba96be8b9f50 (HEAD ->
memory-efficient-attention, xloem/memory-efficient-attention)
Author: xloem <0xloem at gmail.com>
Date:   Thu Jan 27 11:43:45 2022 +0000

    wip efficient attention: dimensions pass but data is truncated


More information about the cypherpunks mailing list