Making text difficult for OCR?

Thu Aug 9 21:04:47 PDT 2001

>On 9 Aug 2001, Dr. Evil is alleged to have written:
> >I have a question for you c'punks.  If you wanted to generate some bitmaps
> >of text which would be difficult or impossible to OCR, but not too
> >difficult for humans to read, how would you do that?  Basically, I want to
> >create GIFs of text which can't be OCRed in a reliable way. I've thought
> >about some things: I can put in noise pixels, I can blur the text, I can
> >rotate, shear, and otherwise distort it.

It depends a lot on your threat model.  If the people who want a copy
are determined enough, they'll just retype it :-)
If you're trying to make signs that video-cameras can't read,
that's a different problem than trying to publish comic books
that teenagers with too much time on their hands can't scan,
or trying to publish source code on paper so that your customers
can inspect the crypto without being able to scan/modify/compile it.
(The latter may satisfy the Gnu Public License (:-), but isn't
particularly useful for crypto, because people can't use it to
produce a binary they can trust...)

If your problem is to make the OCR job require enough manual tweaking
that the reader might as well just retype it, here's what I'd do:
split up each letter into multiple pieces, using different colors for
the different parts of the letter, and vary the color maps across the page.
Also do this for the background space.  And dither the pieces!
OCRs usually work by identifying features of the letter (vertical on the left,
horizontal in the middle, vertical in the lower right, etc.),
after deciding what parts are in the letter and what aren't.

So instead of having to find the black stuff on the white background,
or the yellow stuff on the blue background, it's having to find the
green and cyan dither stuff and the aqua and turquoise dither stuff
on the blue and indigo dither background and the indigo and purple background,
and further down the page you've shuffled other colors in and out of the mix.
So even if it's smart enough to edge-detect blobs of dithered stuff
on top of other dithered stuff, the blobs don't add up to recognizable 
letters -
they add up to fragments that only become a letter if you
put them all together successfully.