27 Jan
2022
27 Jan
'22
9:39 p.m.
the big issue wasn't truncation; it was that i had put the wrong block of code in an if/else condition. current challenge is that the efficient attention implementation doesn't provide for applying dropout (random zeroing of some weights during training) where the perceiver model applies it. i fudged something in, untested. commit 7628f3e4f32ac25b11774d939f2e16a20dd2a8fd (HEAD -> memory-efficient-attention, xloem/memory-efficient-attention) Author: xloem <0xloem@gmail.com> Date: Thu Jan 27 12:38:13 2022 +0000 wip efficient attention: organising separate parts to include dropout and application