Question re: MD5/other key-crunching methods

17 Dec 2003

      On the plane back home, I had the pleasure of being treated to a 
screening of "Sgt. Bilko". Not a bad movie overall, but had a nice 
throwaway crypto line. It got me to thinking, though...

Is it possible to make generalizations about the MD5 hashes of classes of 
input values? That is, can one say that "no input values of length 
greater than 512 bits will..." or 'all input values starting with the 
value 3 have a tendency to..." with any degree of probability? I know 
hash functions strive to evenly distribute values over their range, but I 
wonder if it might sometimes be possible to predict the hash of a value 
without computing it.

Why? Well, it's mainly in regards to the way MD5 and other hash functions 
are used in mapping pass phrases to actual key values for a cipher.

Suppose I have a situation in which I feel comfortable in making
certain generalizations about the passphrase. Perhaps it's all lowercase, 
perhaps all alphanumeric, has five hyphens, whatever. Information which 
may allow one to restrict the passphrase to a certain range.

In a system where the passphrase is the encryption key, that range of key 
values can be doled out and searched sequentially. Since they are likely 
to be one or several contiguous blocks, one may simply distribute the 
task of searching each one to willing machines everywhere. The efforts 
with respect to RC4-40 in the previous year prove that much. If I can 
rule out even 10% of all possible keyvalues, I've saved a good deal of 
time.

What if one is dealing with a passphrase key-crunched w/MD5, though? The 
obvious way to go about it is to compute the MD5 hash for each and every 
value in the given range, then test that set of keys. This is an extra 
step, and adds a measure of extra time to the whole operation. Sure, one 
may abstract it away by claiming it's trivial compared to the problem of 
searching an exponetially large keyspace, but that seems something of a 
cop-out. 

Perhaps it's a silly question, but is it possible to identify a set of 
hashes which correspond to a set of domain values w/o performing the hash 
itself? I'm aware that it's not possible to reverse a one-way hash like 
MD5 (wish we could...what a compression ratio!), and I know "good" hash 
functions strive for properties which would make this exceedingly 
difficult. However, has anyone looked at the question? Is it worth 
considering? 

Thanks.

-David Molnar

Question re: MD5/other key-crunching methods

DAVID A MOLNAR