Surprisingly fast PGP5.5(i) scanning and proofreading

Anonymous remailer at sabotage.net
Fri Jan 23 08:43:35 PST 1998



From: ica at hawking.fokus.gmd.de (Ingmar Camphausen)
Newsgroups: de.comp.security,alt.security.pgp,comp.security.pgp.discuss
Subject: Surprisingly fast PGP5.5(i) scanning and proofreading
Message-ID: <6aaf44$id8 at stern.fokus.gmd.de>

I am probably not the only one surprised how "out of a sudden" 
the 5.5 version of PGP was scanned and proofread by Stale Schumacher
alone (as it seemed to me). 

Hence I asked him and Teun Nijssen (they, according to the README in
the PGPsdk 5.5 distribution on www.pgpi.com, did all the work) how
come they were so much faster and there wasn't a coordinated 
"distributed" proofreading as with the "international" version of PGP5.0.

Teun took the time for a detailed answer and was so kind to permit its
free distribution; so here is what he wrote in reply to my request:

---8<--snip--------------------------------------------------------

Hello Ingmar,

there are no secrets in this reply; you may do with it whatever you like.

/I was somewhat surprised how suddenly and fast PGP5.5(i) was scanned *and*
/proofread.

Thank you.

/Having participated in the proofreading of the 5.0i version and with the
/amount of time spent by the "correctors" all over the world in mind, I had
/the impression that 5.5 was there and proofread like "a bolt out of the
/blue"!

It took roughly 100 hours of stamina, between Christmas and 5 January to scan
the complete 5.5 books and to correct the 5000 pages of the PGPsdk and Win
version.

Scanning was no longer an issue, as we had now an Automatic Document Feeder
for the HP Scanjet 4C at our disposal. This ADF has been the hero of the two
weeks: actually it did not scan 7500 pages ("tools" book + 5.5) but about
10.000 pages, since an initial set of about 3000 pages was discarded when
the OCR quality proved not optimal yet. This wonderful ADF misfed exactly zero
pages on all this effort. It can contain 50 pages input, so 10.000 pages
require 200 manual interventions 30 minutes separated from each other (the HP
Scanjet 4C combined with Omnipage Pro takes a little bit less than 30 seconds
per page). One Pentium did nothing but scanning, several days starting at 9:00
and finishing around 23:00.

The reason that correction went so much better than on either of the two
editions of PGP 5.0 was in the new format of the books. The fonts are now
excellent, including visible 'tab' characters (kind of triangles), visible
spaces (if more than one space is used somewhere, the second one is printed as
a tiny black triangle) and a special 'underscore' with little curls. This new
book format is supported by a truely marvelous new 'repair' utility, that
obviously was optimized to correct the results that come from using an
Omnipage version HP Scanjet 4c: the intro of the book "Tools for Publishing
Source Code via OCR" describe setting-up the OCR on a Mac/Omnipage/HP 4c.

Anyway, repairing 100 pages now takes 4 hours worst case (if very many visible
spaces are used: Omnipage often mis-counts multiple visible spaces). And best
case: 100 pages can be repaired in half an hour. That is twice the speed of
scanning.

The true horror of PGP 5.0 was the BINARIES.ZIP file. It was done within
8 hours for 5.5 Windows. The trick was simply to manually delete any page that
contained any error. Then run the 'sortpage' utility to report what was
missing and *re-scan* those pages. In this way it went from 217 pages to 30
and then to only 6 pages that needed real correction, because at the margin of
these pages some lines had not got enough black ink from the book print
process.

/Did you get any special ...emmmm...support?

very definitely not! Neither would we have accepted it should it have been
available. It was not available. We think this game should be played "by the
book" (sic).

The books were simply an order of magnitude better.
I think that someone at PGP Inc really tested scanning this time, before
printing. Now if they print a next bookset, let's hope they pretty-print
everything, discarding trailing visible whitespace and replacing initial
spaces by tabs on all lines...

/Or did I just miss a "call for volunteers" etc. like the one that
/preceeded the release of PGP5.0i?

Nope, there was no call this time, because Stale and me could do it alone.

/I am very interested in a satisfying answer to this question, for it is
/closely related to the credibility of the international PGP website and
/the new software version.

I agree with that objective. That is the reason that I wrote this answer as
extensively as I did and why I permit you to publish it as widely as you like.

cheers,

teun

Senior Project Manager
Computer Centre
Tilburg University

------------------------------------------>8----snip-------------

The original reply was PGP-signed by Teun; it should be available via 
http://www.in-berlin.de/User/aurora/teun_pgp55-ocr.txt from 
today, ~ 7pm UTC (GMT) on.

	Ingmar 

-- 
Ingmar Camphausen   camphausen at fokus.gmd.de   PGP key on server 
GMD FOKUS  =  Research Institute for Open Communication Systems 







More information about the cypherpunks-legacy mailing list