my existing work is at https://github.com/xloem/rnlp . i just got composer_nb_alter_2.py to work. i took an existing training sample for a masked language model, and made it work with a generative language model. the smallest pretrained generative models i found are bigger, so it takes forever to run on a cpu since it won't fit on my gpu. finetuning a pretrained model is a fast and effective operation. we don't actually need to use a generative model for this. the STaR can be done with a classifier. the model doesn't need to write out the answer, just pick it. however, the authors of the paper used a generative model. i'm thinking about trying to generalise my code to use either a generative model, or a pretrained model. the two are very similar. i was likely planning to do this.