oh and for completing the vocab: I suppose you'd pass the whole
   recording through the original model and then add the output to the
   tokenizer. it'll get enough words I think.
   then the random noise stuff would preserve the non-code text
   but more interesting to map state machines to logit matrices