20 Jan
2022
20 Jan
'22
1:33 p.m.
i looked at the perceiver running a bit. i tried reducing the learning rate, which is way too high. i'm not sure why the characters trend to all the same values (even at the start). reviewing the data flow might be helpful, but it might take a few reviews through different parts to eventually find the issue. experience in transformer models would likely help. my indication that something is wired up wrongly is that the values trend to early 0, whereas if the attention mask is being respected there shouldn't be pull toward 0 because most of the 0s are masked out.