[ot][spam][crazy] Quickly autotranscribing xkcd 4/1 correctly

Sat Apr 2 04:53:16 PDT 2022

import inspect
print(inspect.getsource(s2t.model.generate))

It looks like this speech-to-text model is done just like the
text-based language modeling that got big around openai. It calculates
each next token, one after the other.

This basically means that it shouldn't be hard to convert it to an
arbitrary length simply writing a new generation loop and shifting off
old input.

It also means that it will likely output confidence around every
possible token it could output. So if the right tokens are
represented, we could grow the parts of the model that produce them,
by backpropagating loss that represents this.

This model isn't great for this overall approach because its tokenizer
is focused on conversational speech rather than symbols and keywords
from source code. That's mostly what you're going to find nowadays, is
models focused on conversational speech.