[ot][spam][crazy] crazylogs: STaR

Tue Jul 5 15:51:52 PDT 2022

i ended up enabling the language modeling accelerations with my time

i'm not sure whether they help or not. it looks like the total time on
my system using them for the task that's coded in is about 2 hours and
18.3 minutes.

it looks like composer is mostly designed for vision models. it has
only two text-based techniques implemented; they are mostly for GPT.
still it is great to have a general framework that works with existing
pretrained models, which is important for the task of finetuning.

somewhere out there is a framework for mutating models between forms
and representations, but i have not found it.

i got distracted finding some of lucidrains recent work; lucidrains
has an x-transformers repository at
https://github.com/lucidrains/x-transformers that lets one design a
transformer with various features described by papers enabled or
disabled. most general thing i've run into so far.

the 'alibi' technique used for gpt models in composer, and available
in lucidrains repository, provides for extending the input length much
longer, due to how it mutates attention. there are a number of other
techniques for extending this, too.

i was sad to not see anything in lucidrains' stuff yet for adding
mixture of experts to models. this lets models specialise contextually
better and use less runtime by forming on the fly decisions about
which parts to use. maybe within the next year.