Entropy, Randomness, etc.
With all this talk about entropy, randomness, and gaussian distributions, I'm hoping we can clear up some of the confusion that I am having. My understanding of a random number is a number is generated from two or more unrelated events. In order for this number to be most useful cryptographically, it needs a even distribution. This distribution is such that any one number in the range of possible values is equally probable as any other number. Does this make this distribution a gaussian distribution? Maybe I just don't understand the theory behind the random numbers well enough, but with all of these terms floating around it is hard to keep track. The reason I want to get some of these things clarified, is because I am becoming more interested in some of the analysis people have been talking about being possible. Also how are these statistical measurements done? Is it as simple as a histogram? (useful for simple alphabet transliterations). Are we talking frequency analysis with FFTs and more advanced things? How do we measure the entropy of a random number, or a series of random numbers? Have people on the list done this, or is this still in the range of people that do math and number theory for a living? Give a particular set of data used to generate a random key, such as, a unix box's /dev/mem, how can one measure the number of bits of entropy? These topics are probably covered in some of the basic books in the field, but all of the reference's I've been able to locate don't go into specifics of how to measure the quality of random numbers. Unless some measurements are made, you can't really be sure that those /dev/mem MD5 hashes don't come up the same 10%, 30%, or more of the time. It seems that a lot of assumptions are being made about what is good and what isn't. I plan to build a hardware random number generator and I have a couple of different circuits to do it, but I've heard some comments about some types of noise not being "good" cryptographically. -- Jeremy Porter ------------- Systems Engineering -------- Dell Computer Corp. ------ jerry@terminus.us.dell.com ---- --- 70 4F BD AE 6D E9 D2 66 48 18 8B E7 64 7F 59 8F --- Support your Second Amendment rights to encryption technology.
-----BEGIN PGP SIGNED MESSAGE----- This is a supplement to the fine answer to your question which has already provided by Scott Collins.
How do we measure the entropy of a random number, or a series of random numbers?
Give a particular set of data used to generate a random key, such as, a unix box's /dev/mem, how can one measure the number of bits of entropy?
Actually, it can't be done. The consistent measure of entropy for finite objects like a string or a (finite) series of random numbers is the so-called ``program length complexity''. This is defined as the length of the shortest program for some given universal Turing machine which computes the string. It's consistent in the sense that it has the familiar properties of ``ordinary'' (Shannon) entropy. Unfortunately, it's uncomputable: there's no algorithm which, given an arbitrary finite string S, computes the program-length complexity of S. Program-length complexity is well-studied in the literature. A good introductory paper is ``A Theory of Program Size Formally Identical to Information Theory'' by G. J. Chaitin, _Journal of the ACM_, 22 (1975) reprinted in Chaitin's book _Information Randomness & Incompleteness_, World Scientific Publishing Co., 1990. John E. Kreznar | Relations among people to be by jkreznar@ininx.com | mutual consent, or not at all. -----BEGIN PGP SIGNATURE----- Version: 2.3a iQCVAgUBLP6iDsDhz44ugybJAQH9IwP/V2EZ/crPIENnkWAYFbCKfNrPuStkb7U9 kQurAUc0xgIzcGjYYw6KFAwJ2zMYgGAmtUlbBbkEaJnAjQJc6AT2Q3PBWitWG5Fk +p2YJwSV00TtSxVXqiu7IWUpK2zlbCDzYq0hdoabe4GOoYgdYd96y6WV62AqFb39 MifNcQF5XMQ= =quUv -----END PGP SIGNATURE-----
Program-length complexity is well-studied in the literature. A good introductory paper is ``A Theory of Program Size Formally Identical to Information Theory'' by G. J. Chaitin, _Journal of the ACM_, 22 (1975) reprinted in Chaitin's book _Information Randomness & Incompleteness_, World Scientific Publishing Co., 1990.
The Li+Vitanyi chapter in the _Handbook of Theoretical Comp. Sci_, Vol. A, is a nice review. And your library probably has the book, while it may not have their new text.
John E. Kreznar
Eli ebrandt@jarthur.claremont.edu PGP 2 key by finger or e-mail "Your hideous criminal clock, your insidious time bomb, is tick-tick-ticking." -- L. Detweiler
John Kreznar writes:
Give a particular set of data used to generate a random key, such as, a unix box's /dev/mem, how can one measure the number of bits of entropy?
Actually, it can't be done. The consistent measure of entropy for finite objects like a string or a (finite) series of random numbers is the so-called ``program length complexity''. This is defined as the length of the shortest program for some given universal Turing machine which computes the string. It's consistent in the sense that it has the familiar properties of ``ordinary'' (Shannon) entropy. Unfortunately, it's uncomputable: there's no algorithm which, given an arbitrary finite string S, computes the program-length complexity of S.
The intuitive idea is similar to there being no "maximum compression" of a string: though one may strongly suspect a compression is pretty good and may in fact be the best there really is, one may find an even better compression. Like the "pi" example Scott Collins used. Still, one can make estimates of the entropy of a string.
Program-length complexity is well-studied in the literature. A good introductory paper is ``A Theory of Program Size Formally Identical to Information Theory'' by G. J. Chaitin, _Journal of the ACM_, 22 (1975) reprinted in Chaitin's book _Information Randomness & Incompleteness_, World Scientific Publishing Co., 1990.
And an especially good place to read all about this is in the new book by Ming Li and Paul Vitanyi, "An Introduction to Kolmogorov Complexity and Its Applications," Springer-Verlag, 1993. $60. Lots of good chapters on entropy, program length measures, algorithmic information theory, etc. Ironically, no mention of cryptology at all. (But Charles Bennett, one of the pioneers--especially in the area of "logical depth"--has written about the deep links between the two areas. Basically, ciphertext messages are "cryptoregular" in that they _appear_ to be of high entropy (random) but actually have low entropy when of course the right transformation (key) is applied. You clever folks will by now have seen the link to the opening discussion: how does one know if a given text is "cryptoregular" and actually carries a message or is just random junk? The answer in general is that no mechanistic/algorithmic method exists! (Hardly surprising, if you think about it. A one-time pad is information-theoretically secure. Every English (or Russian, etc.) sentence of length L can be "found" in a cyphertext of length L by trying the "right" pad. A thousand monkeys and all that.) For messages that are not encrypted with one-time pads, this is not the case, and various bits of information can sometimes be extracted. Cryptanalysis sometimes works. Last I heard, though, it doesn't help with breaking RSA (chosen plaintext attacks on RSA don't help with the factoring problem at all...consult the textbooks on the exact situation, if you're interested in such subtleties). Kolmogorov-Chaitin measures of complexity are very exciting. --Tim May -- .......................................................................... Timothy C. May | Crypto Anarchy: encryption, digital money, tcmay@netcom.com | anonymous networks, digital pseudonyms, zero 408-688-5409 | knowledge, reputations, information markets, W.A.S.T.E.: Aptos, CA | black markets, collapse of governments. Higher Power: 2^756839 | Public Key: PGP and MailSafe available. Note: I put time and money into writing this posting. I hope you enjoy it.
participants (4)
-
Eli Brandt -
jerry@terminus.dell.com -
jkreznar@ininx.com -
tcmay@netcom.com