On Thu, 11 Apr 1996, Timothy C. May wrote:
That there can be no simple definition of entropy, or randomness, for an arbitrary set of things, is essentially equivalent to Godel's Theorem.
I think what you mean is that there is no simple way to measure randomness. Simple, nice definitions of randomness do exist. Actually there are two closely related definitions of randomness. The first is entropy, which is a measure of the unpredictability of a random variable. (A random variable is a set of values with a probability assigned to each value. In this case the values are bit strings.) The second is algorithmic complexity, which is a measure of the uncompressibility of a bit string. Notice that it doesn't make sense to talk about the entropy of a string or the algorithmic complexity of a random variable. Unfortunately both of these values are very difficult to measure. Algorithmic complexity is provably uncomputable, because given a string you can't tell when you have found the best compression for it. Entropy can in principle be determined, but in practice it's hard because you must have a good probability model of the mechanism that generates the random variable. What we want to do is calculate the entropy of the output of a physical random number generator. Now if we have a probability model of the rng, then we're home free. For example, if the rng is tossing a fair coin 64 times, then it's easy to calculate that the entropy is 64 bits. But what if the rng is too complex to be easily modeled (for example if it's a human being pounding on the keyboard)? Algorithmic information theory says the entropy of a random variable is equal to its expected algorithmic complexity. So if we could calculate algorithmic complexity, then we can estimate the entropy by sampling the output of the rng many times, calculate the algorithmic complexity of each sample, and take their average as the estimated entropy. Unfortunately, we already know that algorithmic complexity is NOT computable. The best we can do, and what is already apparently done in practice, is to find an upper bound (call it x) on the algorithmic complexity of a string by trying various compression schemes, divide that number by a constant (say 10), and use x/10 as a conservative estimate of the algorithmic complexity. (Tim, I know you already understand all this, but your earlier explanation wasn't very clear. I hope this helps those who are still confused.)
(To forestall charges that I am relying on an all-too-common form of bullshitting, by referring to Godel, what I mean is that "randomness" is best defined in terms of algorithmic information theory, a la Kolmogorov and Chaitin, and explored in Li and Vitanyi's excellent textbook, "Algorithmic Information Theory and its Applications.")
A year ago, you recommended me a book by the same authors titled _An Introduction to Kolmogorov Complexity and Its Applications_. Have the authors written a new book, or are these the same? Wei Dai