i ended up enabling the language modeling accelerations with my time i'm not sure whether they help or not. it looks like the total time on my system using them for the task that's coded in is about 2 hours and 18.3 minutes. it looks like composer is mostly designed for vision models. it has only two text-based techniques implemented; they are mostly for GPT. still it is great to have a general framework that works with existing pretrained models, which is important for the task of finetuning. somewhere out there is a framework for mutating models between forms and representations, but i have not found it. i got distracted finding some of lucidrains recent work; lucidrains has an x-transformers repository at https://github.com/lucidrains/x-transformers that lets one design a transformer with various features described by papers enabled or disabled. most general thing i've run into so far. the 'alibi' technique used for gpt models in composer, and available in lucidrains repository, provides for extending the input length much longer, due to how it mutates attention. there are a number of other techniques for extending this, too. i was sad to not see anything in lucidrains' stuff yet for adding mixture of experts to models. this lets models specialise contextually better and use less runtime by forming on the fly decisions about which parts to use. maybe within the next year.