
Anil Das <das@razor.engr.sgi.com> writes:
How does Dr. Bernstein's announcement of finding a 56 bit collision in md5 using a few hours on a Pentium affect this scheme? It was not clear from his post whether he was looking for a collision with a known hash, or just two different strings with a collision of the given length.
It is interesting that you should reference Dan Berstein's sci.crypt post of a couple of days ago, as his post is what started me thinking about using partial hash collisions as an anti-spam postage system which preserves anonymity. Also Rivest and Shamir have a system which I was aware of called "MicroMint" which makes more complex use of k-way hash collisions as the basis for a complete micro-payment system. Hal Finney described their system in outline a few days ago on cypherpunks, and pointed out the similarity. For remailers and spam control, the requirement is for a function which is expensive to compute, but easy to verify. The partial hash collision satisfies this and can be arbitrarily expensive (1/100 second to 100s of MIPs years), and yet can be verified instantly. The hashcash proposal uses this function as the basis of a decentralised approach to reduce abuse of non-metered internet resources. There are no banks, brokers or any other legally targetable entities which will get shutdown, or coerced into adding identity escrow if used for postage in anonymous communication. As a result of this, the recipient (an anonymous remailer, or private user) of the cash gains no value which can be converted into other currencies, but rather the assurance that the sender has spent more computing resources preparing the message than he will in processing the mail request. I'm fairly sure that Dan Bernstein was talking about a birthday attack, as he was referring to the fact that he had used some technique to keep the storage requirement down, and suggested that those who claimed birthday attacks required high storage where wrong. I think he was found an arbitrary 2-way collision. Arbitrary in the sense that you don't care where the bits that collide are in the hash outputs. And 2-way in the sense that you are trying to find a hash collision of any message texts (you don't care what), ie in: X = H( M ) Y = H( N ) you are trying to find an M and N such that the bits of X and Y are the same in n bit locations for an n-bit collision. If you accept arbitrary collisions it makes the job easier (few of the arbitrary collisions you find will have thier bits in your chosen location if you specify restrictions on acceptable locations for the colliding bits). Hashcash does not accept arbitrary collisions, the only collisions considered accepted by the client must be in the n MSbits. Hashcash is trying to find a collision to a specific string, so it does not involve the storage complexities of finding birthday attacks, or the methods of finding them with lower storage requirments traded for higher overhead. Specifically, the hashcash client it is trying to find an n-MSbit collision on the hash of M: M = day | resource-name X = H( M ) = H( day | resource-name ) The client looks for collisions on texts of the form: M' = M | counter = day | resource-name | counter X' = H( M' ) = H( day | resource-name | counter ) So, it's algorithm in finding hash collisions is incredibly simple: 1. calculate X = H( day | resource-name ) 2. counter = random-start-point 3. calculate X' = H( day | resource-name | counter ) 4. if the first n-bits of X and X' are equal STOP 5. counter = counter + 1 6. goto 3. The hash collision postage is M' = day | resource-name | counter. And there is no easier way to find collisions if the hash function H is one way and collision resistant, as the output bits can for these purposes be treated as individual random variables having value 0 or 1 with 50% probability. It is basically like tossing n coins in a row seeing if the coins are all equal to your chosen set of outputs (head = 0, tail = 1). Funcion H in the hashcash implementation is SHA1. The actual hash used in effect is the short hash formed by discarding the remaining 160 - n bits of the 160 bit SHA1 output. The hashcash client includes the "day" which is the days since 1 Jan 1970 to allow validity periods to be determined by acceptors of ecash, or expiry dates for services charged for in hashcash. The main purpose of the day field being hashed in is to allow the hashcash acceptor to limit the size of his double spending database. The purpose of the "resource-name" is to ensure that cash can not be double spent by spending at different remailers, or different email recipients. The resource has a name (the email address, short remailer name, whatever), and anyone who chooses a non-unique name opens themselves up to double spending. So the cash is recipient specific. You could if you wished use the hashcash client to implement a centralised double spending database, and provide online verification so that service providers can check for double spending. This gives the advantage that the user can create some collisions without having to decide on who to spend them on in advance. For email I don't think it's worth it because the main intention is to force the spammer to bear higher computer resource overheads than the innocent by-standers (remailers, and recipients of unsolicited mass mailings).
(Also I have not tested my SHA1 implementation on a big endian machine, it auto-detects byte endian-ness, theoretically).
% ./sha1test test 1
Good, my bigendian check works.
SHA1("abc") = a9993e364706816aba3e25717850c26c9cd0d89d test ok
% ./hashcash -t -22 speed: 70921 hashes per sec find: 22 bit partial sha1 collision estimate: 30 seconds
Envious of your compute power -- about 10x my 120Mhz 486 linux based system :-) Adam -- Have *you* exported RSA today? --> http://www.dcs.ex.ac.uk/~aba/rsa/ print pack"C*",split/\D+/,`echo "16iII*o\U@{$/=$z;[(pop,pop,unpack"H*",<> )]}\EsMsKsN0[lN*1lK[d2%Sa2/d0<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<J]dsJxp"|dc`