
On 9 Aug 2001, Dr. Evil is alleged to have written:
I have a question for you c'punks. If you wanted to generate some bitmaps of text which would be difficult or impossible to OCR, but not too difficult for humans to read, how would you do that? Basically, I want to create GIFs of text which can't be OCRed in a reliable way. I've thought about some things: I can put in noise pixels, I can blur the text, I can rotate, shear, and otherwise distort it.
It depends a lot on your threat model. If the people who want a copy are determined enough, they'll just retype it :-) If you're trying to make signs that video-cameras can't read, that's a different problem than trying to publish comic books that teenagers with too much time on their hands can't scan, or trying to publish source code on paper so that your customers can inspect the crypto without being able to scan/modify/compile it. (The latter may satisfy the Gnu Public License (:-), but isn't particularly useful for crypto, because people can't use it to produce a binary they can trust...) If your problem is to make the OCR job require enough manual tweaking that the reader might as well just retype it, here's what I'd do: split up each letter into multiple pieces, using different colors for the different parts of the letter, and vary the color maps across the page. Also do this for the background space. And dither the pieces! OCRs usually work by identifying features of the letter (vertical on the left, horizontal in the middle, vertical in the lower right, etc.), after deciding what parts are in the letter and what aren't. So instead of having to find the black stuff on the white background, or the yellow stuff on the blue background, it's having to find the green and cyan dither stuff and the aqua and turquoise dither stuff on the blue and indigo dither background and the indigo and purple background, and further down the page you've shuffled other colors in and out of the mix. So even if it's smart enough to edge-detect blobs of dithered stuff on top of other dithered stuff, the blobs don't add up to recognizable letters - they add up to fragments that only become a letter if you put them all together successfully.