Surprisingly fast PGP5.5(i) scanning and proofreading
From: ica@hawking.fokus.gmd.de (Ingmar Camphausen) Newsgroups: de.comp.security,alt.security.pgp,comp.security.pgp.discuss Subject: Surprisingly fast PGP5.5(i) scanning and proofreading Message-ID: <6aaf44$id8@stern.fokus.gmd.de> I am probably not the only one surprised how "out of a sudden" the 5.5 version of PGP was scanned and proofread by Stale Schumacher alone (as it seemed to me). Hence I asked him and Teun Nijssen (they, according to the README in the PGPsdk 5.5 distribution on www.pgpi.com, did all the work) how come they were so much faster and there wasn't a coordinated "distributed" proofreading as with the "international" version of PGP5.0. Teun took the time for a detailed answer and was so kind to permit its free distribution; so here is what he wrote in reply to my request: ---8<--snip-------------------------------------------------------- Hello Ingmar, there are no secrets in this reply; you may do with it whatever you like. /I was somewhat surprised how suddenly and fast PGP5.5(i) was scanned *and* /proofread. Thank you. /Having participated in the proofreading of the 5.0i version and with the /amount of time spent by the "correctors" all over the world in mind, I had /the impression that 5.5 was there and proofread like "a bolt out of the /blue"! It took roughly 100 hours of stamina, between Christmas and 5 January to scan the complete 5.5 books and to correct the 5000 pages of the PGPsdk and Win version. Scanning was no longer an issue, as we had now an Automatic Document Feeder for the HP Scanjet 4C at our disposal. This ADF has been the hero of the two weeks: actually it did not scan 7500 pages ("tools" book + 5.5) but about 10.000 pages, since an initial set of about 3000 pages was discarded when the OCR quality proved not optimal yet. This wonderful ADF misfed exactly zero pages on all this effort. It can contain 50 pages input, so 10.000 pages require 200 manual interventions 30 minutes separated from each other (the HP Scanjet 4C combined with Omnipage Pro takes a little bit less than 30 seconds per page). One Pentium did nothing but scanning, several days starting at 9:00 and finishing around 23:00. The reason that correction went so much better than on either of the two editions of PGP 5.0 was in the new format of the books. The fonts are now excellent, including visible 'tab' characters (kind of triangles), visible spaces (if more than one space is used somewhere, the second one is printed as a tiny black triangle) and a special 'underscore' with little curls. This new book format is supported by a truely marvelous new 'repair' utility, that obviously was optimized to correct the results that come from using an Omnipage version HP Scanjet 4c: the intro of the book "Tools for Publishing Source Code via OCR" describe setting-up the OCR on a Mac/Omnipage/HP 4c. Anyway, repairing 100 pages now takes 4 hours worst case (if very many visible spaces are used: Omnipage often mis-counts multiple visible spaces). And best case: 100 pages can be repaired in half an hour. That is twice the speed of scanning. The true horror of PGP 5.0 was the BINARIES.ZIP file. It was done within 8 hours for 5.5 Windows. The trick was simply to manually delete any page that contained any error. Then run the 'sortpage' utility to report what was missing and *re-scan* those pages. In this way it went from 217 pages to 30 and then to only 6 pages that needed real correction, because at the margin of these pages some lines had not got enough black ink from the book print process. /Did you get any special ...emmmm...support? very definitely not! Neither would we have accepted it should it have been available. It was not available. We think this game should be played "by the book" (sic). The books were simply an order of magnitude better. I think that someone at PGP Inc really tested scanning this time, before printing. Now if they print a next bookset, let's hope they pretty-print everything, discarding trailing visible whitespace and replacing initial spaces by tabs on all lines... /Or did I just miss a "call for volunteers" etc. like the one that /preceeded the release of PGP5.0i? Nope, there was no call this time, because Stale and me could do it alone. /I am very interested in a satisfying answer to this question, for it is /closely related to the credibility of the international PGP website and /the new software version. I agree with that objective. That is the reason that I wrote this answer as extensively as I did and why I permit you to publish it as widely as you like. cheers, teun Senior Project Manager Computer Centre Tilburg University ------------------------------------------>8----snip------------- The original reply was PGP-signed by Teun; it should be available via http://www.in-berlin.de/User/aurora/teun_pgp55-ocr.txt from today, ~ 7pm UTC (GMT) on. Ingmar -- Ingmar Camphausen camphausen@fokus.gmd.de PGP key on server GMD FOKUS = Research Institute for Open Communication Systems
participants (1)
-
Anonymous