21 Jan
2022
21 Jan
'22
5:28 p.m.
regarding picking model size, i'm vaguely considering starting with small models, and then duplicating the data and doubling the size, putting the old data at the start. the assumption is that small models find rough information that could be useful as input to larger models, and that the double width will provide for new information to still progress down the model - could make sense to test that approach with this number problem. still not sure how to do learning rate effectively, kind of want to make train an optimizer out of a perceiver to have it handle the problem.