oh and for completing the vocab: I suppose you'd pass the whole recording through the original model and then add the output to the tokenizer. it'll get enough words I think.

then the random noise stuff would preserve the non-code text

but more interesting to map state machines to logit matrices