It's audio of logo code that draws a comic.
   The xkcd community I visit is [1]explainxkcd.com, so it's not too
   advanced.
   xkcd is the only webcomic I read regularly since 2014 or so. i'm also
   reading seed a bit, which just recently pushed through a block
   regarding shielded rooms, remains to be seen if it keeps going.
   what's relevant here is the idea of recovering material when you do not
   have a reference for what is correct. you could do it otherwise, of
   course, but there is more return in learning to do it with only the
   available material.
   thoughts:
   whatever technique you choose to initially approach the problem with
   will have some way of extracting from it confidence in its output. with
   transformer models, this is usually a clear part of the output.
   if we have multiple indications of confidence, is it reasonable then to
   use this information to refine and update the models autonomously, and
   have it autocomplete the image reconstruction?
   --
   here's an example:
   - speech-to-text produces logo code with associated confidence
   - logo machine can then mark code with incorrect or correct syntax
   then we can actually make a 3rd metric out of a simple heuristic, for
   example how likely are very long and straight lines to be correct? the
   screenshot at [2]https://github.com/somebody1234/xkcd2601 has many long
   and straight lines that look incorrect.
   By using the existing confidence, we can tune a confidence metric of
   our heuristic.
   Then this heuristic can provide additional confidence information on
   the output.
   By finetuning the speech-to-text model based on this additional
   information, it can then learn properties of the speech in the
   recording, by knowing which output is correct or not. This produces
   improvements in its output, autocorrecting errors that weren't detected
   by the heuristic because of transfer of the correction information to
   the speech data.
   --
   It's part of something I've thought about some that is hard to
   transcribe. There's a way to produce further feedback and improvement
   after that first step, and a way to automate the heuristic generation,
   by simply assuming that some of the output is correct in different ways
   and the correctness is transferrable.

References

   1. http://explainxkcd.com/
   2. https://github.com/somebody1234/xkcd2601