[ot][spam][crazy] Quickly autotranscribing xkcd 4/1 correctly

Undiscussed Horrific Abuse, One Victim of Many gmkarl at gmail.com
Sat Apr 2 04:56:44 PDT 2022


So, the place where using a little bit of hand-curated data might go
here, at least at first, would be the tokenizer. Then we can see if we
can use high-confidence logo code areas, to update low-confidence
areas. We could plan to pass the confidence through other things, like
a logo parser and a heuristic, to improve its accuracy.

We need a detokenizer that can produces logo code.

It's likely also helpful to unroll the generation code a little bit,
so as to see the information that relates to confidence and access
more of the places where loss is calculated and backpropagation
performed, when training.


More information about the cypherpunks mailing list