[spam] [personal] perceiver model notes

k gmkarl at gmail.com
Fri Jan 21 09:28:14 PST 2022


regarding picking model size, i'm vaguely considering starting with
small models, and then duplicating the data and doubling the size,
putting the old data at the start.  the assumption is that small
models find rough information that could be useful as input to larger
models, and that the double width will provide for new information to
still progress down the model

- could make sense to test that approach with this number problem.

still not sure how to do learning rate effectively, kind of want to
make train an optimizer out of a perceiver to have it handle the
problem.


More information about the cypherpunks mailing list