[ot][spam][crazy] Quickly autotranscribing xkcd 4/1 correctly

Sat Apr 2 05:11:35 PDT 2022

Ok, so let's check out this detokenizer.

I don't know how speech-to-text models convert a long stream of
samples into a sequence of tokens, and I suspect they do something to
avoid the concept of including feedback around word boundaries. It
seems to me they go to pains to avoid putting feedback inside their
architectures, but I could be wrong.