-----BEGIN PGP SIGNED MESSAGE----- I know this may be getting off track on this list, but it may be worthwhile. I was exploring the concept of a "data haven" which, to my knowledge, a place whose location is unknown to its users, but via anonymous remailers, files can be stored and retrived from it. I am going to look into writing a script or program that will allow people to store items using a passphrase or their PGP key, and retrive and delete the files on demand. Here are my problems though: 1: I am clueless about Perl, and not that great with C. 2: One must have to "hide" behind a VERY TRUSTABLE remailer, one that does not go down all the time, and one that accepts PGP encoded mail. 3: Would hiding behind one remailer or two be secure enough? There is a problem, unlike simple remailer chaining that people need to be able to E-mail the script. 4: A need for verifing that the mail got to the DH successfully since data errors do occur, and sometimes networks truncate mail packets. (Compuserve is notorius about this, so is Fidonet). 5: A way of making verifing that the user is who (s)he claims to be. (PGP, IDEA, or a passphrase) 6: Multiple security levels, so files cannot be retrived even if one's PGP key is compromised (user settable) 7: How will files be stored? Will folders and directories actually be made, or will they be all stored in one place with wierd names (to prevent name collisions) and one file be the index? Will there be user names or UID's? 8: There will need to be a way to tell if the DH is up or not. 9: How will PGP keys be stored and indexed? One would not want their files mailed in the clear. (How would I mail files if the user cannot use PGP? have a user settable password, and use crypt?) 10: How would people be able to trust a DH? Data Havens, by definition must be _very_ reliable, yet in a secure location to prevent unauthorized access of the files. What bothers me is DH's starting up and either croaking unexpectably or being places for Bad Guys (TM) to be able to snarf unsuspecting people's files. Perhaps a reputation based system? 11: How would a DH turn away files because the disk is full? 12: Would integrating DigiDollars with a DH be a good idea? (For secure storage of your files, we charge $1 DD per month per meg, and .01 DD per transaction.) What would the DH do with the files if they are not paid, or double-spending occurs? I will be working on a command set that one can use for sending and retriving files to and from the DH, as well as an authication system that can support PGP, DES (SunOS style), or crypt (yes, laughable security, but some people cannot use PGP at work). I think I will use perl for eventually writing this, but I know nothing about perl, so will have to print out a manual or two and do some work on my linux box... As per my previous posts, I am very clueless, but If I can get a decent data haven script working, it will be worth all the flames :-). Once the script is written, all one has to do is install the script, and ping a Penet type remailer, then post the anon remailer address, and Voila' a DH now exists. I apologize for the length of this post, but there are a lot of questions and problems in making a stable, usable data haven. - --- Finger dfloyd@lonestar.utsa.edu for PGP key, and please use it when mailing me. -----BEGIN PGP SIGNATURE----- Version: 2.6.1 iQCVAwUBLntxFXDkimqwdwa5AQEE0gP+P+8sjma3rDkrxhZOBRam7/0v6lsUG0e9 fvtUsLHKAYaB8f6cCUUxwtpwhrI/9TPeh7QoQnEcHlhDO1kV46X9kA1n04hhJpXb Rx+BWSNaLHB3tynaXkN0lTIR/r6CGs+zKvc8BOJpLHSL7ajowmXs1C9Z8Lf4IW+G 8IwG9TR/iec= =9Vg8 -----END PGP SIGNATURE-----
I was exploring the concept of a "data haven" which, to my knowledge, a place whose location is unknown to its users, but via anonymous remailers, files can be stored and retrived from it. This is certainly on-topic. As stated, however, the outline suffers badly froma confusion of purpose. It is not necessary to solve every problem that can be thought of, merely to solve the most important problem in such a way that allows it to be combined with other known solutions. Specifically, the proposal worries far too much about communications security and routing issues, which best go elsewhere in the abstraction. The main service proposed is data storage, not anonymous remailing. Remailing can be done with other segments. Secondly, such storage need not be tied to identity. There's no need for passwords or passphrases or even public keys. The main idea here is storage. You want the property that arbitrary people can't scan the storage facility for content, but identity, while it would work, is _more_ than is necessary. (Can anybody anticipate the solution? See below.) 2: One must have to "hide" behind a VERY TRUSTABLE remailer, [...] This is a concern about communications, and is not necessary to the main idea of remote archiving. 4: A need for verifing that the mail got to the DH successfully since data errors do occur, and sometimes networks truncate mail packets. Again, this communication issue should be dealt with in a separate layer that is concerned about the reliability of communications. 5: A way of making verifing that the user is who (s)he claims to be. Identity-based retrieval is possible, but it's not necessary. Since the service is single purpose (storage) and won't be dealt with directly by humans, i.e. no command prompt, but rather will act as a back end for some retrieval process, the persistence of identity isn't required at the back end. Some persistence will certainly be useful, but it can occur at the user's end. 6: Multiple security levels, so files cannot be retrived even if one's PGP key is compromised (user settable) This is really overkill. Every bit of complication makes the code harder to design, harder to write, harder to debug, and harder to deploy. A simple solution with the basic function can later be elaborated upon. 8: There will need to be a way to tell if the DH is up or not. If you make a request, and nothing comes back, it's not up. I don't see the value in extra functionality. 9: How will PGP keys be stored and indexed? Again, this issue can be finessed. At least part of the issue is a communications one as well, which is best dealt with elsewhere. 10: How would people be able to trust a DH? If you store only encrypted data--and only the stupid would not--the only bit of trust is in continued uptime. Replication and redundancy can be handled at the user's end. At some point _every_ replication bottoms out to the unreplicated storage of some bit of data. This is the primitive, and this deserves to get implemented first. 11: How would a DH turn away files because the disk is full? Silent failure should work just fine. Disk space limitations are just as difficult to deal with as communication failures. 12: Would integrating DigiDollars with a DH be a good idea? At some point when they exist, yes. Right now, without such mechanisms, requiring this will prevent any deployment. I apologize for the length of this post, but there are a lot of questions and problems in making a stable, usable data haven. Looking to implement the final goal as a first project is doomed to failure. Implementing a simple primitive as an attainable project is a much better idea. Now for some specifics. There is a package called Almanac which is a file-by-mail server. Leveraging off this code is a good place to start. Lots of the basic issues are already solved. Now, about authentication. The basic service is storage. It's not even providing name access to the storage. The data itself is what is desired, and a cryptographic one-way hash function suffices as a name. Knowledge of the hashcode provides all the authentication that is needed. If you don't know the hashcode, you can't get the file. If you do know the hashcode, you can. No one else can guess the hashcode, and since no one else knows these hashcodes, the hashcodes suffice as a replacement for the presistence of identity. Furthermore, the many files stored by a particular individual are not linked together in any way on the remote site. The storage site need not have this data; in fact even having this data introduces another security risk. The software on the user end can keep track of any mapping desired. Some sort of tracking software on the user end will be needed in any case to keep track of what is stored where; it may as well keep track of a remote name mapping. So the primitives to implement are very simple; there are two: "store text T" and "retrieve the text with hashcode N". Perhaps a third is also desired: "is text with hashcode N present?". This kind of system is very simple. For implementation of the back end, the files can be stored with filenames which are hexadecimal representations of their hashcodes. This representation allows one to leverage the existing index structure of the file system, avoiding the need to code one inside the application. For the front end, a log file will suffice for a trial version of name mapping. The retrieval method is "grep by hand". Something more advanced can be implemented later, perhaps something that looks like a file system or an ftp site. Eric
P.S. Thanks to Bill Stewart for raising this issue last week at the physical meeting. He had a similar idea, with similar complications. There's no shame in not having complete clarity on a first proposal. The basic idea of hashcode-naming arose during Bill's presentation. Eric
participants (2)
-
dfloyd@runner.utsa.edu -
hughes@ah.com