[spam] [personal] perceiver model notes
Undiscussed Horrific Abuse, One Victim & Survivor of
gmkarl at gmail.com
Thu Jan 27 04:39:52 PST 2022
the big issue wasn't truncation; it was that i had put the wrong block
of code in an if/else condition. current challenge is that the
efficient attention implementation doesn't provide for applying
dropout (random zeroing of some weights during training) where the
perceiver model applies it. i fudged something in, untested.
commit 7628f3e4f32ac25b11774d939f2e16a20dd2a8fd (HEAD ->
memory-efficient-attention, xloem/memory-efficient-attention)
Author: xloem <0xloem at gmail.com>
Date: Thu Jan 27 12:38:13 2022 +0000
wip efficient attention: organising separate parts to include
dropout and application
More information about the cypherpunks
mailing list