Re: Searchable Crypto Paper Archive?
At 12:48 PM 9/6/95, David J. Bianco wrote:
I was trying to dig up some cryptography papers cited as references today, when a thought hit me; there seem to be a fair amount of crypto papers available on the Net, but they're pretty scattered. Bell Labs has some online, which is great! The cypherpunks FTP archive has a few, though you can't perform keyword searches against them. In short, it's hard to find papers unless you already know what you want and where it might be.
Having had some experience in designing and implementing technical report retrieval services, I naturally think there's room for improvement here. 8-) What I have in mind is something like NASA's NTRS ("NASA Technical Report Server", <http://techreports.larc.nasa.gov/cgi-bin/NTRS>), which I helped design and implement at my last job.
It's an idea with some attraction. But some issues need discussing. Being an analytical sort of person, prone to looking for flaws in ideas, I'll mention a few: 1. First and foremost, _copyright_ issues. Most articles are copyrighted (automatically, by Berne Convention) and the permission of the authors must be obtained. Authors may also collect royalties, or the conferences may, so unlimited electronic distribution is a potential problem. NASA can publish its reports (and those of other government agencies) electronically because it has the copyrights, or the copyrights are free and clear. Try putting someone's article on the Net without their permission and look out. Indeed, there are a couple of the most important papers on the soda archive site, some of them scanned-in and OCRed by "The Information Liberation Front." There are so few that the authors likely don't even know they are there, or care. But try to put lots of copyrighted material on a site and get ready for actions. Remember, most nations are party to the Berne Convention(s). 2. Many of the papers have complex typography, lots of equations and diagrams. These reproduce poorly on most screens, and really need a new level of display presentation. (Yes, I know about Adobe Acrobat, which I have. Ditto for FrameMaker, and a few other such systems. But not many others have them.) I happen to know the ILF member who posted the Chaum "Dining Cryptographers" paper, anonymously, and know that he picked that paper both because of its importance to his interests and because it was pure text, with no equations and no diagrams. This made it a natural for scanning. 3. In the crypto domain, the papers are much more conveniently concentrated into a handful of conference proceedings, nearly all published by Springer-Verlag. (Those great silvery-grey paperbacks.) This point about Springer-Verlag relates to Item #1 above. Namely, that copyright holders (Springer-Verlag, through publishing arrangements with the conferences) will not take kindly to folks making the papers available electronically. This point, about the limited number of main crypto volumes, also implies another point: many of these papers refer to other papers in the same volume or set of volumes (e.g., papers in the "Crypto '93 Proceedings" will refer to papers in that volume or earlier volumes). This makes it *even more advantageous* for a serious researcher to buy the complete set of volumes. 4. Authentication issues. Electronic versions of articles will need to be signed, to prevent unauthorized modifications. The infrastructure for this is beginning to build, but is clearly not available to many. I am confident that someday most journals will be published electronically. Many people think this likely, whether in 5 years or 15 years. Just too many advantages. However--and this is my point--before that happens a huge amount of negotiation about author's rights to reproduction, about verification of copies, about royalty payments for copies, etc., has to happen. And, the display software/hardware is not quite there yet....too many people would be unable to see the equations and diagrams on the screen. In 5 years, less of a problem. Many authors make their papers available by anonymous ftp, or via the Web. I think this is the way to do it: let those who feel their papers need electronic dissemination do so. The author makes the choice. In summary, this project is probably premature (technologically), has numerous copyright issues to be resolved, and is probably less needed in the crypto community than in some other areas. (Granted, we are not following those other areas, necessarily. But that other domains have not yet gone fully electronic is indicative that others see some of these same problems, and are likely to address them before the math/crypto community does.) Sorry to dissect this proposal so thoroughly, but it's one of the things I do. --Tim May (P.S. The copyright problems can possibly be skirted by using anonymous remailers and offshore data havens in jurisdictions that will not raid the sites, or by message pools. But these are major steps, mostly untested. A "Scientology" site is probably a better test than a site with crypto papers. I wouldn't want to run either of them.) ---------:---------:---------:---------:---------:---------:---------:---- Timothy C. May | Crypto Anarchy: encryption, digital money, tcmay@got.net 408-728-0152 | anonymous networks, digital pseudonyms, zero Corralitos, CA | knowledge, reputations, information markets, Higher Power: 2^756839 | black markets, collapse of governments. "National borders are just speed bumps on the information superhighway."
On Sep 6, 9:36, Timothy C. May sent the following to the NSA's mail archives:
Subject: Re: Searchable Crypto Paper Archive?
Thanks for the reply. I think there are a few misconceptions, though. I've responded in place to some of your comments... || || It's an idea with some attraction. But some issues need discussing. Being || an analytical sort of person, prone to looking for flaws in ideas, I'll || mention a few: || || 1. First and foremost, _copyright_ issues. Most articles are copyrighted || (automatically, by Berne Convention) and the permission of the authors must || be obtained. Authors may also collect royalties, or the conferences may, so || unlimited electronic distribution is a potential problem. || || NASA can publish its reports (and those of other government agencies) || electronically because it has the copyrights, or the copyrights are free || and clear. Try putting someone's article on the Net without their || permission and look out. || || Indeed, there are a couple of the most important papers on the soda archive || site, some of them scanned-in and OCRed by "The Information Liberation || Front." There are so few that the authors likely don't even know they are || there, or care. But try to put lots of copyrighted material on a site and || get ready for actions. Remember, most nations are party to the Berne || Convention(s). Hmmm... I guess I didn't specifically mention this point since it seemed obvious to me, though I probably should have: Papers should come from the authors or the organization which holds the copyright. I wouldn't be in favor of accepting 3rd party submissions, for both copyright and authenticty/integrity issues. || || 2. Many of the papers have complex typography, lots of equations and || diagrams. These reproduce poorly on most screens, and really need a new || level of display presentation. (Yes, I know about Adobe Acrobat, which I || have. Ditto for FrameMaker, and a few other such systems. But not many || others have them.) || || I happen to know the ILF member who posted the Chaum "Dining || Cryptographers" paper, anonymously, and know that he picked that paper both || because of its importance to his interests and because it was pure text, || with no equations and no diagrams. This made it a natural for scanning. || The model we've used so far is that the format of the papers is independant of the bibliographic information which we index. For example, the NASA system I mentioned has papers in both HTML and Postscript formats. The abstracts (which are what's indexed) simply contain URLs, and don't really care what the document types are. In my experience, most of the target audience for technical papers has access to a postscript previewer (for online viewing) and/or a postscript printer, so postscript tends to be the format of choice. Still, it can be anything; text, PDF, scanned in TIFF files all have worked for us in the past. || 3. In the crypto domain, the papers are much more conveniently concentrated || into a handful of conference proceedings, nearly all published by || Springer-Verlag. (Those great silvery-grey paperbacks.) || || This point about Springer-Verlag relates to Item #1 above. Namely, that || copyright holders (Springer-Verlag, through publishing arrangements with || the conferences) will not take kindly to folks making the papers available || electronically. || || This point, about the limited number of main crypto volumes, also implies || another point: many of these papers refer to other papers in the same || volume or set of volumes (e.g., papers in the "Crypto '93 Proceedings" will || refer to papers in that volume or earlier volumes). This makes it *even || more advantageous* for a serious researcher to buy the complete set of || volumes. || Now that's a pretty good point. Wonder if we could convince them to make their papers available electronically? 8-) But ignoring them, there still seem to be a fair amount of cryptography papers published as technical reports by individual authors or organizations. These would be what I'd like to see in CTRS. || 4. Authentication issues. Electronic versions of articles will need to be || signed, to prevent unauthorized modifications. The infrastructure for this || is beginning to build, but is clearly not available to many. || || I am confident that someday most journals will be published electronically. || Many people think this likely, whether in 5 years or 15 years. Just too || many advantages. || Another good point, but I think this could easily be marked down as an issue to be worked on after the basic functionality is available. I'd hate to see this as a reason for not doing something. || However--and this is my point--before that happens a huge amount of || negotiation about author's rights to reproduction, about verification of || copies, about royalty payments for copies, etc., has to happen. And, the || display software/hardware is not quite there yet....too many people would || be unable to see the equations and diagrams on the screen. In 5 years, less || of a problem. || || Many authors make their papers available by anonymous ftp, or via the Web. || I think this is the way to do it: let those who feel their papers need || electronic dissemination do so. The author makes the choice. This is exactly the target audience I'm looking for. When an author wants to put a paper up on his FTP or WWW site, I hope they'll also send me the indexing information so that when people want to find it, they can use CTRS. I'm not interested in actually storing a copy of the report, although I'm willing to do so if they cannot make it available any other way. || || In summary, this project is probably premature (technologically), has || numerous copyright issues to be resolved, and is probably less needed in || the crypto community than in some other areas. || || (Granted, we are not following those other areas, necessarily. But that || other domains have not yet gone fully electronic is indicative that others || see some of these same problems, and are likely to address them before the || math/crypto community does.) || I have to disagree strongly about the technologically premature part, since I have had a lot of experience to the contrary during my involvment with several major technical report systems. I'm afraid I also have to disagree with you about the need for this service. Having attempted to find some of the reports which I've heard are available on the Net, I'd have to say it's not a task I'd set an Internet novice too, or one I'd give to someone on a deadline. I think a good bibliographic database like I propose in CTRS would be a definite help. And at the very very very least, it probably won't hurt. 8-) || Sorry to dissect this proposal so thoroughly, but it's one of the things I do. || S'ok with me. It's not like I'm dead set on doing this or anything. It's just an observation, and an offer of service if anyone thinks it'd be useful. Oh, BTW, another thing I probably should mention that seems obvious to me: I'm offering to do this for free. That is, the database would be a public service, with no charge to list papers, add another database to the searching list or to query/retrieve abstracts. -- ========================================================================== David J. Bianco | Web Wonders, Online Oddities, Cool Stuff iTribe, Inc. | Suite 1700, World Trade Center | email: <bianco@itribe.net> Norfolk, VA 23510 | URL : http://www.itribe.net/~bianco/
participants (2)
-
David J. Bianco -
tcmay@got.net