Cracking MD5 for $10M
I mentioned a few days ago that one of the "rump session" papers at the crypto conference claimed that a machine could be built which would find MD5 collisions for $10M in about 20 days. I wanted to write a little more detail about how this attack could work. It is similar to a "meet in the middle" (MITM) attack which Norm Hardy suggested here in July when we were discussing double DES:
There may be more than one way that MITM (meet in the middle) may be used to attack Double block cyphers. I assume the following attack. You know some block of plain-text P and corresponding cypher text C. You believe that C = E(k, E(j, P)) where E(k, p) is the encypherment of p with key k. D(k, E(k, p)) = p. You need to find keys k and j. Classic MITM is to produce a file A with records: <k, E(k, P)> for each k, and file B with records <j, D(j, C)> for each j. Sort both A and B on the second field. Pass over the sorted files looking for a record from file A whose second field is the same as a record in file B.
To substantially shorten the ammount of tape used by a factor 2^n at the expense of evaluating C and D 2^n more often do the following:
For m from 0 to 2^n-1 Do Produce file A with records: <k, E(k, P)> for each k where (the right n bits of E(k, P)) = m. (discarding other records) Produce file B with records <j, D(j, C)> for each j where (the right n bits of D(j, C)) = m Sort files A and B on second field. Pass over files looking for records from A that match records from b in the second field. Enddo.
The idea of saving only outputs where certain bits are constant is the key to the "distinguished points" method which is used to save space with only a modest cost in time. The other key idea is that instead of evaluating MD5(n) where n iterates on its own, you look for cycles in the recurrence x = MD5(x). Any cycle which is found which does not include the x you start with will lead to a case where two values hash to the same MD5 value. For a trivial example, suppose the output of a formula like this consists of the values 1,4,5,2,7,8,5,2,7,8,5,2,7,8,.... Here we have a four element cycle which leads to two different predecessors for the value 5. The brute-force way to solve this would be to save all outputs from the formula, and with each new value to compare it with all earlier values. With MD5, which has a presumably random structure and 128 bits of output, the birthday paradox suggests that you would have to create and save about 2^64 output values before finding a match. Creating 2^64 values might be possible today for the time and dollar values we are talking about, but storing them appears to be out of the question, as our earlier discussion of double DES (and other discussions of MITM here) have made clear. The distinguished points method reduces the space requirements by only saving a fraction of the output values. For example, in the list above, we might only save multiples of 4. This would lead to 4,8,8... and it is easy to discover the match without nearly as much storage. Note, though, that 8 is not actually the value which has two predecessors, but that once this match is discovered, you can go back to the previous points (4 and 8 in this case) and run them forward more carefully, looking for a match. The other real advantage of the distinguished points method is that it parallelizes very nicely. Several machines can run x=MD5(x) with different starting values, saving all of the distinguished outputs, and we can look for matches between machines as well as in one machine. Again, a match implies two different predecessors for the same value, which is an MD5 collision. With the size of MD5, suppose we generate 2^64 outputs but only save those for which the low-order 32 bits are 0 as our distinguished points. Only 1/2^32 of values will match, so we will end up with about 2^32 outputs, probably a manageable amount. Chances are there will be a match among that set. We then go back to the previous distinguished points before the match and work forward carefully to look for the exact pair of values which lead to the same successor. Distinguished points will be about 2^32 apart so this step is easy and quick. If you want to speed it up still more you can do a recursive distinguished points pass for this step using maybe d.p.'s with the low-order 16-bits of 0 and do it in two steps that will both be very short. The net result is that we have taken virtually no more time (the 2^64 creations of MD5 will dominate) and virtually no space (compared to 2^64 stored values) and we get the effect of a birthday attack. This is another cautionary data point about the risks of relying on space costs for security rather than time costs. Hal
Hal discusses using the Distinguished Points method to find hash collisions presented by Michael Wiener with Paul van Oorschot at Rump Crypto '94, and lists two benefits: (1) saves space in searching for loops on a single processor; (2) allows parallel searches for collisions over multiple processors. I claim it's useful only for (2), because another algorithm dominates it for single processor loop detection... at least in storage space. It works as follows: get a sequence of values v(i+1) = MD5(v(i)); simultaneously get another sequence w(i+1) = MD5(MD5(w(i))), and start them at the same place, v(0) = w(0). That is, you're running one of them twice as fast as the other. At each iteration you compare v(i) with w(i), and if they're equal, you've looped. Drawing a few rho-shaped trajectories on paper and following them around with two pencils should be enough to complete a proof by hand-waving that it always catches a cycle; but perhaps not as soon as the distinguished points would. The distinguished points across machines is a great idea for (2), though, and doesn't depend on anything looping... cool stuff! Do you (Hal?) or anybody else know whether Wiener and van Oorschot were taking into account the contraction of the range each time you iterate MD5? I think the size of the set of all numbers that are the result of MD5ing a 128-bit number is considerably smaller than 2^128... is it 1/e of that? Anybody know about random mappings? Subsequent iterations reduce it further, though of course not by 1/e each time, so that the set of numbers that are the result of iteratively MD5ing a number N times should be an appreciably smaller set to be groping around in. For example, I iterated the right-most 14 bits of SHA 26,539 times from one seed before the range shrank to a single point. Note that it need not shrink that far in general, since some of the survivors would typically map into each other. Jim Gillogly 18 Halimath S.R. 1994, 16:12
Jim Gillogly <jim@rand.org> writes:
Hal discusses using the Distinguished Points method to find hash collisions presented by Michael Wiener with Paul van Oorschot at Rump Crypto '94, and lists two benefits:
(1) saves space in searching for loops on a single processor; (2) allows parallel searches for collisions over multiple processors.
I claim it's useful only for (2), because another algorithm dominates it for single processor loop detection... at least in storage space. ["rho" method elided]
Yes, this is a good point, the main advantage of the DP algorithm is that it parallelizes. Rho does have the problem that you have to run 3 MD5's for each step, but OTOH it does not have the overhead of saving and checking the distinguished points, so which one would be best on a single processor would depend on the relative costs.
Do you (Hal?) or anybody else know whether Wiener and van Oorschot were taking into account the contraction of the range each time you iterate MD5? I think the size of the set of all numbers that are the result of MD5ing a 128-bit number is considerably smaller than 2^128... is it 1/e of that? Anybody know about random mappings?
They didn't mention anything about this, and I would think they would have if they had considered it. My intuition was that x=MD5(x) would cover a large fraction of the 128 bit output space, but on further thought Jim appears to be right: with n input values into a random function (n would be 2^128 in this case), the chance of a particular output being missed for any one input would be 1-1/n, and the chance of it being missed for all n inputs would be (1-1/n)^n. Taking the limit as n approaches infinity gives 1/e as the fraction of values which would be missed. This means that the fraction of hits would be 1 - 1/e, much lower than I had guessed.
Subsequent iterations reduce it further, though of course not by 1/e each time, so that the set of numbers that are the result of iteratively MD5ing a number N times should be an appreciably smaller set to be groping around in.
The way I figure it, if the fraction of the original n is f (which would be 1 before the first iteration, and 1 - 1/e before the 2nd iteration based on the above), the chance of a point being missed is (1-1/n)^(nf), which is 1/e^f. So f would be found by f = 1 - 1/e^f, iterating once per MD5 iteration and starting f at 1. I just did an experiment of iterating this. After 100 times f was about .02; after 1000 times f was about .002, suggesting f = 2/iterations. If this is right, you might be able to get a birthday match after only the cube root of n tries rather than the square root of n, or about 2^44 iterations or so rather than 2^64, because at that point you are only looking at 2^85 possible output values. This result is only really valid for serial machines; parallel ones search more per iteration so this would move you back towards the 2^64 number. It does imply that you don't really get k-fold speedup with k machines if you take this effect into consideration.
Jim Gillogly 18 Halimath S.R. 1994, 16:12
Gee, my calendar must be off! Hal
Hal discusses using the Distinguished Points method to find hash collisions presented by Michael Wiener with Paul van Oorschot at Rump Crypto '94, and lists two benefits:
(1) saves space in searching for loops on a single processor; (2) allows parallel searches for collisions over multiple processors.
I claim it's useful only for (2), because another algorithm dominates it for single processor loop detection... at least in storage space.
[...describes nifty algorithm (which seems to be well-known in the folklore?) for finding cycles in linear time and constant space...] Yeah! I was discussing this algorithm 4 or 5 months ago on alt.math.iams; it's quite elegant. If there is a collision after the n-th value, then I believe this algorithm will find it after generating (at most) 2n values. It's been kinda simmering in the back of my head for months, me wondering how to parallelize this algorithm -- and it's really cool to see how Wiener and van Oorschot found a way to find cycles efficiently in parallel! Apparently two professors here (Yao & Sedgewick) wrote a paper on this in SIAM Journal of Computer in 1981 -- I'm gonna go dig through the library to see if I can find this, when I get a chance...
The distinguished points across machines is a great idea for (2), though, and doesn't depend on anything looping... cool stuff!
Uh.. I think it *does* depend on looping! A collision in *any* point means that there will soon be a collision in a distinguished point, when you use looping. This probably won't be true with any other generation method. Suppose we use the sequence a_n = MD5(n). Then a collision a_i = a_j will only be detected if a_i is a distinguished point. But because we use the sequence a_n = MD5( a_{n-1} ), a collision a_i = a_j implies that there will soon be a collision a_{i+m} = a_{j+m} with a_{i+m} a distinguished point (after m ~= 2^32 extra iterations, on average, if 1 in 2^32 points are distinguished).
Do you (Hal?) or anybody else know whether Wiener and van Oorschot were taking into account the contraction of the range each time you iterate MD5? I think the size of the set of all numbers that are the result of MD5ing a 128-bit number is considerably smaller than 2^128... is it 1/e of that?
Hrmm, why should this change the expected number of iterations required to find a collision? If I'm being dense, hopefully you'll spell it out for me. :-) I've been thinking about writing a program to test the single-processor cycling algorithm with (for example) crypt(3) for a while now -- maybe this'd be a good excuse to write it now, and try the parallel distinguished point stuff, too. Does anybody think it'd be interesting to get some practical experience here? Sound like an interesting doable project? A few things I've been thinking about, which maybe will spark your interest enough to answer all my questions. (one can always hope! :-) First of all, there's some non-zero probability that (when using the parallelized distinguished points algorithm) two processors will have their streams match exactly without yielding a useful collision. Suppose one processor picks the random starting value 3 and generates a sequence starting with 3,1,4,5,2,7,9,... Now further suppose that MD5(6)=3 and that another processor picks the random starting value 6; then the second processor will generate 6,3,1,4,5,2,7,9,... We'll eventually notice this: if 9 is a distinguished point, then we'll see that two processors have seen the value 9, and we'll start backtracing, but we won't get any useful collision in MD5 out of this -- we'll only get the information that MD5(6)=3, which is useless, since both 6 and 3 were random choices. This means that the second processor's computer power was wasted. Can anyone estimate how often this will happen so that we can know it won't slow things down too much? Also, there was the arbitrary choice of making the distinguished points be those with the lower 32 bits all zero -- I wonder what is the effect of requiring (say) all 48 least significant bits to be zero? This will increase the time required to backtrack (unless some fancy schmancy rescursive or parallel algorithm is used?) but it would also decrease the space and inter-chip communication required significantly. Any comments? Another thing -- I'm not sure this method is (directly) useful for generating lots of collisions, if that is what is desired. I believe Dr. Hellman wrote some paper about the cycling properties of random functions (out of interest in DES), and he concluded (if I remember correctly) that when you generate lots of random starting values and look at their cycling properties, most starting values will drain into a very few specific cycles. [I think this was in some volume of CRYPTO: maybe '86 or so? I think the title was something like "Drainage properties of the DES" or somesuch. I'll have to look it up.] Doesn't that reduce the number of different collisions that you can generate by a large factor? If so, are there any simple modifications to the iteration function which would help? How about a_n = MD5( a_{n-1} XOR V ) for some random V picked anew each time we want a new collision? Finally, is there a way to adopt an approach like this to reduce the space requirements needed to break double DES? Let P and P' be two plaintexts, and C=E(k,E(k',P)) and C'=E(k,E(k',P')) be their encipherment under double DES; we want to find the unknown keys k, k'. For any X in {0,1}^128, , define the function function h : {0,1}^128 -> {0,1}^128 by h(X) = E(y,P) concatenated with E(y,P') if z=0, or h(X) = D(y,P) concatenated with D(y,P') if z=1 where y consists of bits 0-55 of X and z is bit 56 of X. If h(X)=h(X') and X != X' and w != w', then with high probability the collision in h gives us the enciphering keys y and y'. Can we use some parallel distinguished points cycling - like algorithm to find the appropriate collision in h? If we generate enough values of h, we will exhaust the entire keyspace, and will necessarily find the enciphering keys. (By the coupon collector's paradox, this should require something like 2^57 * 57 * log 2 iterations or so on average.) The only problem is that there will probably be lots of collisions X,X' with h(X)=h(X') and X != X' and w = w' -- I think. Can anyone think of a way to deal with these useless collisions in h to make finding a useful collision in h easy? If so, this should give a method to break double DES in 2^64 time and very little memory. But maybe this all useless drivel... Anyhow, this message has gotten very long. Thanks for reading. And many many thanks to Hal for typing in the description of Wiener and van Oorschot's idea! ------------------------------------------------------------------------------- David Wagner dawagner@princeton.edu
participants (3)
-
David A. Wagner -
Hal -
Jim Gillogly