Re: why compression doesn't perfectly even out entropy
At 5:36 AM 4/11/96, jamesd@echeque.com wrote:
At 10:36 PM 4/9/96 -0400, JonWienke@aol.com wrote:
Would anyone like to propose a means of measuring entropy that we can all agree on? I haven't seen anything yet that everyone likes.
Nor will you: To measure entropy is a deep unsolved philosophical and physical problem.
Indeed. That there can be no simple definition of entropy, or randomness, for an arbitrary set of things, is essentially equivalent to Godel's Theorem. (To forestall charges that I am relying on an all-too-common form of bullshitting, by referring to Godel, what I mean is that "randomness" is best defined in terms of algorithmic information theory, a la Kolmogorov and Chaitin, and explored in Li and Vitanyi's excellent textbook, "Algorithmic Information Theory and its Applications.") Think of it this way: when can a set of things, a string, etc., be _compressed_. Answer: whenever a compression is found. Most things have no real compressions, that is, they have no shorter description than themselves. But they _might_ have a shorter description, a compression, and we can never say for sure that they do not. Thus, even a set which we think is of "high entropy" (roughly, "high randomness" or "no order" or "not compressible") may actually have some hidden order, or compressibility, not apparent at first glance. That we can never know when we have achieved maximum compression is a profound result of modern mathematics and information theory. --Tim May Boycott "Big Brother Inside" software! We got computers, we're tapping phone lines, we know that that ain't allowed. ---------:---------:---------:---------:---------:---------:---------:---- Timothy C. May | Crypto Anarchy: encryption, digital money, tcmay@got.net 408-728-0152 | anonymous networks, digital pseudonyms, zero W.A.S.T.E.: Corralitos, CA | knowledge, reputations, information markets, Higher Power: 2^756839 - 1 | black markets, collapse of governments. "National borders aren't even speed bumps on the information superhighway."
On Thu, 11 Apr 1996, Timothy C. May wrote:
That there can be no simple definition of entropy, or randomness, for an arbitrary set of things, is essentially equivalent to Godel's Theorem.
I think what you mean is that there is no simple way to measure randomness. Simple, nice definitions of randomness do exist. Actually there are two closely related definitions of randomness. The first is entropy, which is a measure of the unpredictability of a random variable. (A random variable is a set of values with a probability assigned to each value. In this case the values are bit strings.) The second is algorithmic complexity, which is a measure of the uncompressibility of a bit string. Notice that it doesn't make sense to talk about the entropy of a string or the algorithmic complexity of a random variable. Unfortunately both of these values are very difficult to measure. Algorithmic complexity is provably uncomputable, because given a string you can't tell when you have found the best compression for it. Entropy can in principle be determined, but in practice it's hard because you must have a good probability model of the mechanism that generates the random variable. What we want to do is calculate the entropy of the output of a physical random number generator. Now if we have a probability model of the rng, then we're home free. For example, if the rng is tossing a fair coin 64 times, then it's easy to calculate that the entropy is 64 bits. But what if the rng is too complex to be easily modeled (for example if it's a human being pounding on the keyboard)? Algorithmic information theory says the entropy of a random variable is equal to its expected algorithmic complexity. So if we could calculate algorithmic complexity, then we can estimate the entropy by sampling the output of the rng many times, calculate the algorithmic complexity of each sample, and take their average as the estimated entropy. Unfortunately, we already know that algorithmic complexity is NOT computable. The best we can do, and what is already apparently done in practice, is to find an upper bound (call it x) on the algorithmic complexity of a string by trying various compression schemes, divide that number by a constant (say 10), and use x/10 as a conservative estimate of the algorithmic complexity. (Tim, I know you already understand all this, but your earlier explanation wasn't very clear. I hope this helps those who are still confused.)
(To forestall charges that I am relying on an all-too-common form of bullshitting, by referring to Godel, what I mean is that "randomness" is best defined in terms of algorithmic information theory, a la Kolmogorov and Chaitin, and explored in Li and Vitanyi's excellent textbook, "Algorithmic Information Theory and its Applications.")
A year ago, you recommended me a book by the same authors titled _An Introduction to Kolmogorov Complexity and Its Applications_. Have the authors written a new book, or are these the same? Wei Dai
participants (2)
-
tcmay@got.net -
Wei Dai