I have followed with interest this discussion of passphrase "entropy". What I'm not clear on is the effect of a hashing algorithm on the final entropy. If I come up with a "random" set of printable characters which contain 128 bits of entropy, and feed them to MD5, let's say, will I still have 128 bits of entropy on the output? Or do I need some sort of safety margin above 128 bits to "be sure"? What's lurking in the back of my mind is this -- if you enter something with LESS than 128 bits, the hashing algorithm has to "pad" or otherwise fill in the missing bits from <somewhere>. Now if I have entered a phrase with EXACTLY 128 bits of entropy, hypothetically, is that enough to have flushed the padding or whatever out of the pipeline? Can we really treat MD5 as a "magic black box", or does the optimal input require a knowledge of how the box works? .
On Mon, 4 Jul 1994 nobody@shell.portal.com wrote:
Now if I have entered a phrase with EXACTLY 128 bits of entropy, hypothetically, is that enough to have flushed the padding or whatever out of the pipeline? I have had this question also, has it been shown that the transformation of 128bit words through md5 is *theoretically* invertable, as if it is not, iterating it 1024 times could actually make you *LOOSE* entropy. (say it was a random transformation, it would not contain each of the 128 bit outputs, ie some inputs would map to the same output.)
I am not aware of any such result. Roger.
MD5, like all hash functions, are many-to-one functions. This means that theoretically there are an infinite number of messages that will hash to the same value. This also means that reverting from the hash back to your original message is nigh impossible. The security of MD5 is based upon the fact that *finding* two messages that hash to the same value is as difficult as a brute-force attack, which requires 2^128 trials (maybe it's 2^127, but I don't think that really matters). I dion't believe that multiple iterations of MD5 will cause you to lose entropy. Actually, you will lose entropy on teh *first* iteration, since MD5 will \*only\* let you have 128 bits of Entropy, since there are only 128 bits in the output. In subsequent iterations, you just move those bits around. Does this answer your question? -derek Derek Atkins, SB '93 MIT EE, G MIT Media Laboratory Member, MIT Student Information Processing Board (SIPB) Home page: http://www.mit.edu:8001/people/warlord/home_page.html warlord@MIT.EDU PP-ASEL N1NWH PGP key available
On Mon, 4 Jul 1994, Derek Atkins wrote:
Does this answer your question? No.
Again, the only way that MD5 can keep the entropy of a string is for every single 128 bit string to map itself onto a unique 128 bit string, for if two 128 bit strings produce the same output, then you loose entropy. The question is, when md5 is restricted to 128 bit values, does it loose entropy, and if so how much? As much as a random mapping? if so, the 1024 bit itteration in secure drive HARMS security. Roger.
On Mon, 4 Jul 1994, Derek Atkins wrote:
is based upon the fact that *finding* two messages that hash to the same value is as difficult as a brute-force attack, which requires 2^128 trials (maybe it's 2^127, but I don't think that really This is incorrect, with a large memory, this is the birthday paradox in action, and it takes about 2^64 tries, which puts SHS right up there at 2^80 same as skipjack.
Even with less memory, you can still improve on this though not as much. Roger, Mad Dog Libertarian, Bryner.
MD5, like all hash functions, are many-to-one functions. This means that theoretically there are an infinite number of messages that will hash to the same value. This also means that reverting from the hash back to your original message is nigh impossible. The security of MD5 is based upon the fact that *finding* two messages that hash to the same value is as difficult as a brute-force attack, which requires 2^128 trials (maybe it's 2^127, but I don't think that really matters). Hmm, I read this as reverting is imossible, as it genrealy is when you start with 1MB and hash it to 128 bits(or compression would be neat!),
On Mon, 4 Jul 1994, Derek Atkins wrote: then that finding two messages that hash to the same value is as difficult as brute force, which is not really true, if taken literally. Perhaps my original question about cycles and entropy loss is beter in the context of a broken system such as MD4. Are there 128 bit messages in MD4 which hash to the same value, and if so, what insight into the cycle leingth vs string leingth would it give us. lets say each dot is a 128 bit number, a string could feed a cycle, such as shown below. When this occurs, you loose entropy, as it ceases to be sequentially dependent on a 128 bit number, and instead a subset of the cycle. ==> ....................... . . ..... Here is an example hash function, for two 64 bit words, a, b; hash(a,b)=a+b,a-b; now hash^2(a,b)=2a,2b. so here you have lost 1 bit of information when you start to itterate the hash function, and will be left with exactly 1 option after 128 iterations of this function in every case. This is why I won't use securedrive with the 1024 option, as I view it as a SERIOUS NEGITIVE THREAT TO SECURITY OF THE SYSTEM. Changeing this to encrypting 1024 times with idea and a key generated by a PRNG has no such security hole possible, and is what I would view as a proper "buisy work function[TM]" althought nothing has been said about its ireducibility. I would recomend replacing that option or discarding it, that is unless hash functions never throw away bits in sizes smaller than their output size. (again, that was my question) Roger.
Are there 128 bit messages in MD4 which hash to the same value, and if so, what insight into the cycle leingth vs string leingth would it give us.
If there are, then you have broken MD4! This is the definition of breaking a Hash: finding two strings (of *any* size) that hash to the same value. Let me comment on something you wrote:
hash(a,b)=a+b,a-b; now hash^2(a,b)=2a,2b.
so here you have lost 1 bit of information when you start to itterate the hash function, and will be left with exactly 1 option after 128 iterations of this function in every case.
If we make a small adjustment to the definition of this hash routine, and define the hash to be: hash(a,b) = (a+b)mod 2^64, (a-b)mod 2^64 Then I argue that you will not lose that bit of information, since it will just wrap around the 64-bit values instead of just doing a bit-shift. The point here is that if MD5 lost entropy, it would probably make it easier to find two strings to hash to the same value, which, by definition, breaks that hash.
I would recomend replacing that option or discarding it, that is unless hash functions never throw away bits in sizes smaller than their output size. (again, that was my question)
They shouldn't. I refer back to my last statement, that if they did, it would make breaking the hash much easier. I hope this helps. -derek
On Tue, 5 Jul 1994, Derek Atkins wrote:
Roger:
I would recomend replacing that option or discarding it, that is unless hash functions never throw away bits in sizes smaller than their output size. (again, that was my question)
They shouldn't. I refer back to my last statement, that if they did, it would make breaking the hash much easier.
This refers to the secure drive 1024 iterations of MD5. Without a proof that md5(128bit number) is a one to one transformation, my statement about looseing entropy is possibly. I don't think that it has been demonstrated that md5^1024 is more secure than md5. NOBODY HAS IMPLIED THAT SUCH A PROOF, or equivilent proof, exists. Roger.
Nobody wrote:
I have followed with interest this discussion of passphrase "entropy". What I'm not clear on is the effect of a hashing algorithm on the final entropy. If I come up with a "random" set of printable characters which contain 128 bits of entropy, and feed them to MD5, let's say, will I still have 128 bits of entropy on the output? Or do I need some sort of safety margin above 128 bits to "be sure"?
What's lurking in the back of my mind is this -- if you enter something with LESS than 128 bits, the hashing algorithm has to "pad" or otherwise fill in the missing bits from <somewhere>. Now if I have entered a phrase with EXACTLY 128 bits of entropy, hypothetically, is that enough to have flushed the padding or whatever out of the pipeline?
Can we really treat MD5 as a "magic black box", or does the optimal input require a knowledge of how the box works?
Consider a cellular automata...the Game of Life is a simple example it 2-D, but 1-D versions have been studied extensively. It starts with the string: "1 0 1" and iterates/crunches on it, producing this output: 1 0 1 1 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 1 1 (etc.) Now does the final string, a seemingly randomly-looking and "high-entropy" string actually have high entropy? No, not if the machine (CA rule set) that generated it is known. (As an aside, encrypted strings _appear_ to have high entropy, but generally they don't actually have this high entropy....because they are actually fairly low entropy strings like "Frost in Brazil, buy coffee futures today." Such strings are called "cryptoregular.") In the above case, one can treat the machine as the key. Steven Wienberg conjectured that cellular automata could be used for encryption. I think it was later proved, not too surprisingly to me at least, that his CA-based systems were formally equivalent to linear feedback shift registers (LFSRs), which are are not very strong. The point I want to make though is that the 3 bits started with (1 0 1) turn into 40 or 100 or whatever bits throught the process of crunching on them. Things which give evidence of having a lot of "history" or computation behind them are said to have high "logical depth." The most obvious example around us is _life_. For example, it is often claimed by certain enthusiasts of nanotechnology that the creation of life-like agents should be relatively easy because, for example, e. coli "only" contains a few megabytes of code in its DNA. Since we can make _chips_ that store this amount of code.... Aargghh! The problem is _which_ code! A few meg doesn't sound like much, but e. coli only lives when the code is the right code, a relatively few of the 2^1,000,000 or more sequences that are possible. (Now that's a search space!). Life has had several billion years and incredible numbers of generations to find the interesting places in "DNA space." This is what is meant by logical depth. Back to crypto. The point "nobody" made about MD5 and the like "padding out" the bits is a good one. There are, in a sense, no more bits of entropy than one started with, because MD5 and similar hashes are _deterministic_. But an attacker must contend with the increased logical depth, which is in some sense orthogonal to bit entropy (randomness). (If I could draw a picture here, it would have an x-axis reprsenting bit entropy and a y-axis representing logical depth.) This can slow down an attack, in that the attacker probably (*) needs to do certain computations to track this logical depth. Like requiring someone in a contest to stop and do some computations, even if deterministic. I don't know of any good analyses of the cryptographic effects of such lines of thinking. (* I said "probably" because there's always the possibility that what Alice thinks is an extra set of computations her hash is forcing Bob to do is not actually needed, that Bob knows of some tricks that allows him to bypass them. A standard crypto problem.) Well, sorry for the long discussion. This business of logical depth is near and dear to me, and is a part of "algorithmic information theory," the field pioneered by Kolmogorov and Chaitin. Lots of interesting resonances with crypto. --Tim May -- .......................................................................... Timothy C. May | Crypto Anarchy: encryption, digital money, tcmay@netcom.com | anonymous networks, digital pseudonyms, zero 408-688-5409 | knowledge, reputations, information markets, W.A.S.T.E.: Aptos, CA | black markets, collapse of governments. Higher Power: 2^859433 | Public Key: PGP and MailSafe available. "National borders are just speed bumps on the information superhighway."
On Mon, 4 Jul 1994, Timothy C. May wrote:
and iterates/crunches on it, producing this output:
1 0 1 1 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 1 1 The ALGORITHIM also contains information. If the ALGORITHIM is part of a secret key, so much the better. To say exactly how much information an algorithim contains is, to say the least, formatable. In the case of functions, it is simple.
Lets put the question to addition, how much entropy does + have when applied to bits.? Roger, Mad Dog, Bryner.
participants (4)
-
Derek Atkins -
nobody@shell.portal.com -
Roger Bryner -
tcmay@netcom.com