It's audio of logo code that draws a comic. The xkcd community I visit is explainxkcd.com, so it's not too advanced. xkcd is the only webcomic I read regularly since 2014 or so. i'm also reading seed a bit, which just recently pushed through a block regarding shielded rooms, remains to be seen if it keeps going. what's relevant here is the idea of recovering material when you do not have a reference for what is correct. you could do it otherwise, of course, but there is more return in learning to do it with only the available material. thoughts: whatever technique you choose to initially approach the problem with will have some way of extracting from it confidence in its output. with transformer models, this is usually a clear part of the output. if we have multiple indications of confidence, is it reasonable then to use this information to refine and update the models autonomously, and have it autocomplete the image reconstruction? -- here's an example: - speech-to-text produces logo code with associated confidence - logo machine can then mark code with incorrect or correct syntax then we can actually make a 3rd metric out of a simple heuristic, for example how likely are very long and straight lines to be correct? the screenshot at https://github.com/somebody1234/xkcd2601 has many long and straight lines that look incorrect. By using the existing confidence, we can tune a confidence metric of our heuristic. Then this heuristic can provide additional confidence information on the output. By finetuning the speech-to-text model based on this additional information, it can then learn properties of the speech in the recording, by knowing which output is correct or not. This produces improvements in its output, autocorrecting errors that weren't detected by the heuristic because of transfer of the correction information to the speech data. -- It's part of something I've thought about some that is hard to transcribe. There's a way to produce further feedback and improvement after that first step, and a way to automate the heuristic generation, by simply assuming that some of the output is correct in different ways and the correctness is transferrable.