23 Jul
2022
23 Jul
'22
11:29 a.m.
the final trained adapter is only a few megabytes large, despite the model being several gigabytes. a nagging part of me keeps considering the pretraining content of the model. i'm not sure what t5 models are trained but, i imagine a generative or masked model would have more general-purpose knowledge and general language understanding, than something that was simply trained on translation or summarization type tasks, where word-mapping without comprehension could get you very far. i should probably look this up at some point. but anyway i'm thinking of switching to xlnet.