
Accuracy will depend on the quality of the original being scanned, as well as the capability of the OCR system; flat originals scan much better than the "bent open" pages of a book or magazine, heavy stock tends to let less "bleed" through from the reverse side, fonts with extreme kerning are more difficult, point size is a factor, etc. I've seen 97%+ w/ Calera, (about 2 years ago) when using flat, first generation high quality photocopies w/ minimal skew and courier or similar typeface. OTOH, the same system did not scan well at all w/ badly skewed photocopies (caused by the "bend" induced by the binding of the original). If you are scanning medical journals, take a look at your originals and also at where the errors are occuring. You can also use a spell checker (after building up a suitable dictionary for your application) to cut out some of the error. I'd guess your results to be less satisfactory for other applications where extreme accuracy is a must. "3", "8", and "B" for example, are often confused; not a big problem w/ a medical journal, but plays havoc w/ code, accouting data, etc. -r.w. On Fri, 3 Jan 1997, /**\anonymous/**\ wrote:
Alan Olsen wrote:
I used to work for a company that would transfer entire archives of medical journals. Much of it we would just OCR. Some of it we would send off shore. The OCR software was about 95% reliable and this was over 5 years ago. (And we were using 286 boxes for much of the OCR work. Not a heavy technoligical investment.) I am sure that things have improved a great deal since then. (My new scanner included OCR software. I will have to run a test and report the findings.
I'd like to know what OCR software you were using. All tests we completed at my place of employment were very poor quality wise. We showed a %65 accuracy rate. Not very good when you need to transfer a five year backlog of medical and technical journals. This was using a high resolution scanner with a package that was bundled along with it. About a year ago, my employer considered transfering data taken off of forms into a relational database using an OCR program. Again, we found the findings to be too innacurate for our needs. I may have just been using the wrong programs for the job, but the findings were depressing...
panther
--- | If you're not part of the solution, You're part of the precipitate. | |"The moral PGP Diffie taught Zimmermann unites all| Disclaimer: | | mankind free in one-key-steganography-privacy!" | Ignore the man | |`finger -l alano@teleport.com` for PGP 2.6.2 key | behind the keyboard.| | http://www.ctrl-alt-del.com/~alan/ |alan@ctrl-alt-del.com|