On 6/18/23, Karl Semich <0xloem@gmail.com> wrote:
here’s the code for the bayesian one: https://github.com/gmum/few-shot-hypernets-public/tree/master/methods/hypern...
i’m wondering if there’s more private than public research here dunno
anyway it seems like these papers kind of say the encoding of weights is a hyperparameter that hasn’t been sufficiently studied
it seems it would make sense to tack a linear layer onto a transformer. the google paper uses raw outputs of a transformer, and the hypernets paper says it uses linear heads. it’s notable i think it uses things simpler than a transformer. it’s also notable one of them said they actually had to remove almost all of the parts to prevent overfitting when there was very little data
both papers are tackling image recognition using cnns which is kind of specific
i guess i’d like to try to train something in ggml next. maybe later unsure.
note that it seems worthwhile to study metalearning more extensively