[ANNOUNCE] hash cash postage implementation

A partial hash collision based postage scheme I've been talking about a partial hash collision based postage scheme on the crypto lists for the last few days. The idea of using partial hashes is that they can be made arbitrarily expensive to compute (by choosing the desired number of bits of collision), and yet can be verified instantly. A first cut implementation of these ideas can be fetched from here: * hashcash.tgz * PGP signature of hashcash.tgz I will describe what the implementation allows by example, using the program so you can follow along if you wish. There is an integrated partial hash collision generator (hashcash mint) and remailer plug-in. The remailer plug-in should be easy to hook into typeI remailers. typeII or nymserver would require modifying the mixmaster client/remailer, the code has been designed to make this relatively easy to do, and link directly into mixmaster, or alpha or newnym. Compiling % gcc -O6 -c *.c % gcc -o hashcash hashcash.o sha1.o % gcc -o sha1 sha1file.o sha1.o % gcc -o sha1test sha1test.o sha1.o % The functions provided by the program are Run with no arguments and it prints a list of flags and terse usage instructions. Speed test: % hashcash -t speed: 7218 hashes per sec % This tells you the number of hashes it can search per second. (Note, the implementation of sha1 it is using is not optimised, it is my reference implementation. I have another speed freak sha1 implementation which needs 1/2 hrs work, and has the same interface, so I'll plug it in later. It's commented in sha1.c). Estimate time required to find 17-bit partial hash collision: % hashcash -t -17 speed: 7274 hashes per sec find: 17 bit partial sha1 collision estimate: 9 seconds % (varies quite widely from estimated time) Calculate a 17 bit collision on string "flame" (flame is a now dead remailer): % hashcash -17 flame speed: 7274 hashes per sec find: 17 bit partial sha1 collision collision: 09948flame356018443 tries: 57450 % The collision is actually found on the hash of the date since Jan 1 1970 in days (5 digit leading zero filled) and string given. So the collision is found on: % echo -n 09948flame | sha1 fbbea210abafe3e72afe7849eaea2052e309e017 % The collision that was found can be seen manually as the collision is constrained to be the MSbits of the digest: % echo -n 09948flame356018443 | sha1 fbbead76da651cf856f7466fed9624d3ada061d9 % You can manually see that the first 20 bits match. (Note we asked for a 17 bit hash, but it generates a 17 bit or larger hash. We just got lucky and got an extra 3 bits, which would happen about 1 time in 8). The hashcash client can also report on a collision: % hashcash flame 09948flame356018443 collision: 20 bits % You can check on the validity period as compared to todays date: % hashcash flame 09948flame356018443 28 validity: 28 days remaining collision: 20 bits % You can check that a hash has a requested number of bits: % hashcash -25 flame 09948flame356018443 collision: 20 bits required: 25 bits rejected: too few bits % The exit status is set to failure if any of the tests fail: ie if there are too few bits, or if you do a validity check and the hash has expired, or isn't yet in it's validity period. Double spending protection You can also ask the hashcash client to remember collisions within a selected validity period. % hashcash -d -25 flame 09948flame356018443 28 validity: 28 days remaining collision: 20 bits required: 20 bits database: not double spent adding: 28 09948flame356018443 % The collision will only be added if all the tests pass (in validity period, required number of bits). The exit status also tells you if the hash was ok. The database would grow indefinately as the mailer accepted new payments, so the validity period is stored in the database, and at the next use of the database after the validity period expires, the collision will be discarded. There is no need to keep expired collisions around because we wouldn't accetp it anyway because it's out of date, so who cares if we've previously accepted it. A validity period of 0 means valid forever, and it will never be discarded from the database. An example of double spending A new test now is whether a hash has been presented before, so we the above command and expect it to be rejected as already spent, because it is in the database: % hashcash -d -25 flame 09948flame356018443 28 validity: 28 days remaining collision: 20 bits required: 25 bits rejected: too few bits database: double spent % (exit status is set to failure, due to double spent cash) That's it It's very lightly tested, so if anything breaks let me know. It's got some inefficiencies in places, but working code comes first, efficiency later. (Also I have not tested my SHA1 implementation on a big endian machine, it auto-detects byte endian-ness, theoretically). Remailer applications discussed so far * exit remailer postage per post * spam filter on mail you receive, if it hasn't got a 20 bit hash, or 1c digicash you have a program which bounces it with a notice explaining the required postage, and where to obtain software from. This would put spammers out of business overnight, as 1,000,000 x 20 = 100 MIP years which is going to be more compute than they've got, and 1,000,000 x 1c = $10,000 is going to be more than they are likely to make through sales interest from the spam. * good behaviour bond for nymserver users. The nymserver user pays the nymserver (in a largish hash collision, or $25 digicash) for a replyable nymserver account. They agree an contract with the penalty clause that breaking the contract means the nymserver keeps the digicash/collision, and disables the account. They user's identity isn't known, but to join in again they have to pay up another large hash collision or more digicash. * there are lots of other ideas to play with. How does this fit in with digicash Digicash postage on remailers, and mail would be useful, however there are a number of problems with digicash: * It is more onerous to set up an account (form filling etc) * Not many people have accounts * It's only anonymous for the payer anonymous (and not anonymous for the seller) So my suggestion is that we have remailers which accept either hash collision postage, or digicash postage. The advantages of this approach are: * Hashcash may provide a stop gap measure until digicash becomes more widely used * Hashcash is free, all you've got to do is burn some cycles on your PC. It is in keeping with net culture of free discourse, where the financially challenged can duke it out with millionaires, retired government officials, etc on equal terms. * Hashcash may provide us with a fall back method for controling spam if digicash goes sour (gets outlawed or required to escrow user identities). Any comments, code, etc gratefully received. A couple of remailers alpha testing it would be nice also. Adam -- Have *you* exported RSA today? --> http://www.dcs.ex.ac.uk/~aba/rsa/ print pack"C*",split/\D+/,`echo "16iII*o\U@{$/=$z;[(pop,pop,unpack"H*",<> )]}\EsMsKsN0[lN*1lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<J]dsJxp"|dc`

Adam, How does Dr. Bernstein's announcement of finding a 56 bit collision in md5 using a few hours on a Pentium affect this scheme? It was not clear from his post whether he was looking for a collision with a known hash, or just two different strings with a collision of the given length. On Mar 28, 4:52pm, Adam Back wrote:
(Also I have not tested my SHA1 implementation on a big endian machine, it auto-detects byte endian-ness, theoretically).
Works fine here. Big endian Mips R10K. % ./sha1test test 1 SHA1("abc") = a9993e364706816aba3e25717850c26c9cd0d89d test ok test 2 SHA1("abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq") = 84983e441c3bd26ebaae4aa1f95129e5e54670f1 test ok test 3 SHA1("a" x 1,000,000) = 34aa973cd4c4daa4f61eeb2bdbad27316534016f test ok % ./hashcash -t -22 speed: 70921 hashes per sec find: 22 bit partial sha1 collision estimate: 30 seconds -- Anil Das

Anil Das <das@razor.engr.sgi.com> writes:
How does Dr. Bernstein's announcement of finding a 56 bit collision in md5 using a few hours on a Pentium affect this scheme? It was not clear from his post whether he was looking for a collision with a known hash, or just two different strings with a collision of the given length.
It is interesting that you should reference Dan Berstein's sci.crypt post of a couple of days ago, as his post is what started me thinking about using partial hash collisions as an anti-spam postage system which preserves anonymity. Also Rivest and Shamir have a system which I was aware of called "MicroMint" which makes more complex use of k-way hash collisions as the basis for a complete micro-payment system. Hal Finney described their system in outline a few days ago on cypherpunks, and pointed out the similarity. For remailers and spam control, the requirement is for a function which is expensive to compute, but easy to verify. The partial hash collision satisfies this and can be arbitrarily expensive (1/100 second to 100s of MIPs years), and yet can be verified instantly. The hashcash proposal uses this function as the basis of a decentralised approach to reduce abuse of non-metered internet resources. There are no banks, brokers or any other legally targetable entities which will get shutdown, or coerced into adding identity escrow if used for postage in anonymous communication. As a result of this, the recipient (an anonymous remailer, or private user) of the cash gains no value which can be converted into other currencies, but rather the assurance that the sender has spent more computing resources preparing the message than he will in processing the mail request. I'm fairly sure that Dan Bernstein was talking about a birthday attack, as he was referring to the fact that he had used some technique to keep the storage requirement down, and suggested that those who claimed birthday attacks required high storage where wrong. I think he was found an arbitrary 2-way collision. Arbitrary in the sense that you don't care where the bits that collide are in the hash outputs. And 2-way in the sense that you are trying to find a hash collision of any message texts (you don't care what), ie in: X = H( M ) Y = H( N ) you are trying to find an M and N such that the bits of X and Y are the same in n bit locations for an n-bit collision. If you accept arbitrary collisions it makes the job easier (few of the arbitrary collisions you find will have thier bits in your chosen location if you specify restrictions on acceptable locations for the colliding bits). Hashcash does not accept arbitrary collisions, the only collisions considered accepted by the client must be in the n MSbits. Hashcash is trying to find a collision to a specific string, so it does not involve the storage complexities of finding birthday attacks, or the methods of finding them with lower storage requirments traded for higher overhead. Specifically, the hashcash client it is trying to find an n-MSbit collision on the hash of M: M = day | resource-name X = H( M ) = H( day | resource-name ) The client looks for collisions on texts of the form: M' = M | counter = day | resource-name | counter X' = H( M' ) = H( day | resource-name | counter ) So, it's algorithm in finding hash collisions is incredibly simple: 1. calculate X = H( day | resource-name ) 2. counter = random-start-point 3. calculate X' = H( day | resource-name | counter ) 4. if the first n-bits of X and X' are equal STOP 5. counter = counter + 1 6. goto 3. The hash collision postage is M' = day | resource-name | counter. And there is no easier way to find collisions if the hash function H is one way and collision resistant, as the output bits can for these purposes be treated as individual random variables having value 0 or 1 with 50% probability. It is basically like tossing n coins in a row seeing if the coins are all equal to your chosen set of outputs (head = 0, tail = 1). Funcion H in the hashcash implementation is SHA1. The actual hash used in effect is the short hash formed by discarding the remaining 160 - n bits of the 160 bit SHA1 output. The hashcash client includes the "day" which is the days since 1 Jan 1970 to allow validity periods to be determined by acceptors of ecash, or expiry dates for services charged for in hashcash. The main purpose of the day field being hashed in is to allow the hashcash acceptor to limit the size of his double spending database. The purpose of the "resource-name" is to ensure that cash can not be double spent by spending at different remailers, or different email recipients. The resource has a name (the email address, short remailer name, whatever), and anyone who chooses a non-unique name opens themselves up to double spending. So the cash is recipient specific. You could if you wished use the hashcash client to implement a centralised double spending database, and provide online verification so that service providers can check for double spending. This gives the advantage that the user can create some collisions without having to decide on who to spend them on in advance. For email I don't think it's worth it because the main intention is to force the spammer to bear higher computer resource overheads than the innocent by-standers (remailers, and recipients of unsolicited mass mailings).
(Also I have not tested my SHA1 implementation on a big endian machine, it auto-detects byte endian-ness, theoretically).
% ./sha1test test 1
Good, my bigendian check works.
SHA1("abc") = a9993e364706816aba3e25717850c26c9cd0d89d test ok
% ./hashcash -t -22 speed: 70921 hashes per sec find: 22 bit partial sha1 collision estimate: 30 seconds
Envious of your compute power -- about 10x my 120Mhz 486 linux based system :-) Adam -- Have *you* exported RSA today? --> http://www.dcs.ex.ac.uk/~aba/rsa/ print pack"C*",split/\D+/,`echo "16iII*o\U@{$/=$z;[(pop,pop,unpack"H*",<> )]}\EsMsKsN0[lN*1lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<J]dsJxp"|dc`
participants (2)
-
Adam Back
-
das@razor.engr.sgi.com