
On the plane back home, I had the pleasure of being treated to a screening of "Sgt. Bilko". Not a bad movie overall, but had a nice throwaway crypto line. It got me to thinking, though... Is it possible to make generalizations about the MD5 hashes of classes of input values? That is, can one say that "no input values of length greater than 512 bits will..." or 'all input values starting with the value 3 have a tendency to..." with any degree of probability? I know hash functions strive to evenly distribute values over their range, but I wonder if it might sometimes be possible to predict the hash of a value without computing it. Why? Well, it's mainly in regards to the way MD5 and other hash functions are used in mapping pass phrases to actual key values for a cipher. Suppose I have a situation in which I feel comfortable in making certain generalizations about the passphrase. Perhaps it's all lowercase, perhaps all alphanumeric, has five hyphens, whatever. Information which may allow one to restrict the passphrase to a certain range. In a system where the passphrase is the encryption key, that range of key values can be doled out and searched sequentially. Since they are likely to be one or several contiguous blocks, one may simply distribute the task of searching each one to willing machines everywhere. The efforts with respect to RC4-40 in the previous year prove that much. If I can rule out even 10% of all possible keyvalues, I've saved a good deal of time. What if one is dealing with a passphrase key-crunched w/MD5, though? The obvious way to go about it is to compute the MD5 hash for each and every value in the given range, then test that set of keys. This is an extra step, and adds a measure of extra time to the whole operation. Sure, one may abstract it away by claiming it's trivial compared to the problem of searching an exponetially large keyspace, but that seems something of a cop-out. Perhaps it's a silly question, but is it possible to identify a set of hashes which correspond to a set of domain values w/o performing the hash itself? I'm aware that it's not possible to reverse a one-way hash like MD5 (wish we could...what a compression ratio!), and I know "good" hash functions strive for properties which would make this exceedingly difficult. However, has anyone looked at the question? Is it worth considering? Thanks. -David Molnar