[ot][spam][crazy] adapters for semibalanced trees?

Sat Jul 23 12:58:27 PDT 2022

i ran it again until colab logged me out. the loss dropped to 0.7 or so.
apparently colab lets you have a gpu again if you disable background execution.

i'm running it some more, just for effective use of time.

i looked into how longt5 works, and basically it locally
contextualises the regions of its inputs, but not the regions of its
outputs during generation (it would simply be a flag to change this,
but it is how they pretrained it). so it is good at reading very long
things, and then outputting very short things that conclude from them.
it is also documented as having a limit of 16k tokens, so it is not
general.

while working i added a tokenizer hack to approach things like
linebreaks. i haven't tested it yet, since the unsupervised training
(i'm calling it grooming to help stuff), is effective whether the data
is perfect or not. [unsupervised grooming is possibly a larger issue].
i also added some stubs for other models: xnn and transformer-xl,
which i found the repo for. unfortunately, transformer-xl uses a
different kind of tokenizer that my hack doesn't quite work for.
still, the time to train another adapter would give space to figure
out how to make the other tokenizer work.

i think what makes sense next for me, after reviewing the details of
this longt5 model i've spent a couple days on, is to find a way to
combine the different commit files into one input. this would make the
model much more effective as it could learn the meaning between files
rather than memorising what files are in a repository, to output
specific updates for individual files.

i also found the huggingface interface to longt5 lets you 'prompt' the
t5 model with initial decoder ids, so if the model accepted all
relevant files as input, you could prompt it with each separate file
in order to produce output for each one in smaller bundles. since it
has a much smaller output window than input window.