OCR and Machine Readable Text

Rabid Wombat wombat at mcfeely.bsfs.org
Fri Jan 3 12:19:27 PST 1997



Accuracy will depend on the quality of the original being scanned, as 
well as the capability of the OCR system; flat originals scan much better 
than the "bent open" pages of a book or magazine, heavy stock tends to 
let less "bleed" through from the reverse side, fonts with extreme 
kerning are more difficult, point size is a factor, etc.

I've seen 97%+ w/ Calera, (about 2 years ago) when using flat, first
generation high quality photocopies w/ minimal skew and courier or similar
typeface. OTOH, the same system did not scan well at all w/ badly skewed
photocopies (caused by the "bend" induced by the binding of the original).
If you are scanning medical journals, take a look at your originals and
also at where the errors are occuring. 

You can also use a spell checker (after building up a suitable dictionary 
for your application) to cut out some of the error.

I'd guess your results to be less satisfactory for other applications 
where extreme accuracy is a must. "3", "8", and "B" for example, are 
often confused; not a big problem w/ a medical journal, but plays havoc 
w/ code, accouting data, etc.

-r.w.

On Fri, 3 Jan 1997, /**\anonymous/**\ wrote:

> Alan Olsen wrote:
> > I used to work for a company that would transfer entire archives of medical
> > journals.  Much of it we would just OCR.  Some of it we would send off
> > shore.  The OCR software was about 95% reliable and this was over 5 years
> > ago.  (And we were using 286 boxes for much of the OCR work.  Not a heavy
> > technoligical investment.)  I am sure that things have improved a great
> > deal since then.  (My new scanner included OCR software.  I will have to
> > run a test and report the findings.
> 
> 	I'd like to know what OCR software you were using.  All tests we
> completed at my place of employment were very poor quality wise.  We
> showed
> a %65 accuracy rate.  Not very good when you need to transfer a five
> year
> backlog of medical and technical journals.  This was using a high
> resolution
> scanner with a package that was bundled along with it.  About a year
> ago,
> my employer considered transfering data taken off of forms into a
> relational
> database using an OCR program.  Again, we found the findings to be too
> innacurate for our needs.  I may have just been using the wrong programs
> for
> the job, but the findings were depressing...
> 
> panther
> 
> > ---
> > |   If you're not part of the solution, You're part of the precipitate.  |
> > |"The moral PGP Diffie taught Zimmermann unites all| Disclaimer:         |
> > | mankind free in one-key-steganography-privacy!"  | Ignore the man      |
> > |`finger -l alano at teleport.com` for PGP 2.6.2 key  | behind the keyboard.|
> > |         http://www.ctrl-alt-del.com/~alan/       |alan at ctrl-alt-del.com|
> 






More information about the cypherpunks-legacy mailing list