26 Jan
2022
26 Jan
'22
11:19 p.m.
i'm working on the matrix permutations inside huggingface perceiver and the major torch implementation of efficient attention. i have two scripts to call them, to step through and map the offsets. it's very hard for me to think about the axis permutations. scripts are attached. perceiver_loader.py also functions as an interactive model loader for the model generated by the line in the previous email. using it on a trained model, one can see that the training fits to many thousands of numbers but still fails on rare numbers especially small numbers which only have so many examples in the data. it also fails if the data input format changes, such as adding the word 'and' or a hyphen.