Nobody wrote:
I have followed with interest this discussion of passphrase "entropy". What I'm not clear on is the effect of a hashing algorithm on the final entropy. If I come up with a "random" set of printable characters which contain 128 bits of entropy, and feed them to MD5, let's say, will I still have 128 bits of entropy on the output? Or do I need some sort of safety margin above 128 bits to "be sure"?
What's lurking in the back of my mind is this -- if you enter something with LESS than 128 bits, the hashing algorithm has to "pad" or otherwise fill in the missing bits from <somewhere>. Now if I have entered a phrase with EXACTLY 128 bits of entropy, hypothetically, is that enough to have flushed the padding or whatever out of the pipeline?
Can we really treat MD5 as a "magic black box", or does the optimal input require a knowledge of how the box works?
Consider a cellular automata...the Game of Life is a simple example it 2-D, but 1-D versions have been studied extensively. It starts with the string: "1 0 1" and iterates/crunches on it, producing this output: 1 0 1 1 1 0 1 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 0 1 1 1 0 1 1 (etc.) Now does the final string, a seemingly randomly-looking and "high-entropy" string actually have high entropy? No, not if the machine (CA rule set) that generated it is known. (As an aside, encrypted strings _appear_ to have high entropy, but generally they don't actually have this high entropy....because they are actually fairly low entropy strings like "Frost in Brazil, buy coffee futures today." Such strings are called "cryptoregular.") In the above case, one can treat the machine as the key. Steven Wienberg conjectured that cellular automata could be used for encryption. I think it was later proved, not too surprisingly to me at least, that his CA-based systems were formally equivalent to linear feedback shift registers (LFSRs), which are are not very strong. The point I want to make though is that the 3 bits started with (1 0 1) turn into 40 or 100 or whatever bits throught the process of crunching on them. Things which give evidence of having a lot of "history" or computation behind them are said to have high "logical depth." The most obvious example around us is _life_. For example, it is often claimed by certain enthusiasts of nanotechnology that the creation of life-like agents should be relatively easy because, for example, e. coli "only" contains a few megabytes of code in its DNA. Since we can make _chips_ that store this amount of code.... Aargghh! The problem is _which_ code! A few meg doesn't sound like much, but e. coli only lives when the code is the right code, a relatively few of the 2^1,000,000 or more sequences that are possible. (Now that's a search space!). Life has had several billion years and incredible numbers of generations to find the interesting places in "DNA space." This is what is meant by logical depth. Back to crypto. The point "nobody" made about MD5 and the like "padding out" the bits is a good one. There are, in a sense, no more bits of entropy than one started with, because MD5 and similar hashes are _deterministic_. But an attacker must contend with the increased logical depth, which is in some sense orthogonal to bit entropy (randomness). (If I could draw a picture here, it would have an x-axis reprsenting bit entropy and a y-axis representing logical depth.) This can slow down an attack, in that the attacker probably (*) needs to do certain computations to track this logical depth. Like requiring someone in a contest to stop and do some computations, even if deterministic. I don't know of any good analyses of the cryptographic effects of such lines of thinking. (* I said "probably" because there's always the possibility that what Alice thinks is an extra set of computations her hash is forcing Bob to do is not actually needed, that Bob knows of some tricks that allows him to bypass them. A standard crypto problem.) Well, sorry for the long discussion. This business of logical depth is near and dear to me, and is a part of "algorithmic information theory," the field pioneered by Kolmogorov and Chaitin. Lots of interesting resonances with crypto. --Tim May -- .......................................................................... Timothy C. May | Crypto Anarchy: encryption, digital money, tcmay@netcom.com | anonymous networks, digital pseudonyms, zero 408-688-5409 | knowledge, reputations, information markets, W.A.S.T.E.: Aptos, CA | black markets, collapse of governments. Higher Power: 2^859433 | Public Key: PGP and MailSafe available. "National borders are just speed bumps on the information superhighway."