On Wednesday, January 9, 2002, at 09:24 AM, Michael Motyka wrote:
Ken Brown <k.brown@ccs.bbk.ac.uk> wrote :
Michael Motyka wrote:
Here we go :
"As Eric correctly points out, true random input simply cannot be compressed, it doesn't matter how clever you are or how much time and computing power you have access to."
This is a statement of belief isn't it? Odd.
No, it's a logical consequence of his definition of "random" & no more a statement of belief than "1+1=2" is. If you use the word to mean something else then it might or might not be true in your universe.
Ken
Perhaps my universe is off in all six degrees but answer this - which is it :
random data cannot be compressed
First, there is no such thing as "a set of random data" (which is what I take you to mean by "random data"). There are operational definitions of "a set of data generated by a process which appears to be random." Clicks of a Geiger counter, thermal noise from a reverse-biased diode, etc. (If my words "appear to be generated" are weaselly, it's the nature of the beast. It is generally not possible to either prove that a source _is_ random or that a set of data are random.) Of strings which we would all call random, of length N, the majority of them have no description shorter than themselves.
or
random data may include recognizable and even repeating patterns
Of course any particular string will have clumps and small patterns, etc. This doesn't make them compressible, however. Run-length encoding on a "random" data set or string doesn't yield any compression, on average. (For a different slant on why this is so, intuitively, it's related to why strategies for "doubling up on winning bets" fail. Theory of martingales.)
A string of random data may include anything. Compressibility will vary with the particular data and the algorithm used. Lucky with one input set, unlucky with another. I wouldn't try to base a business on getting lucky with the customer's input data but the absolute statement of incompressibility is wrong unless you preselect your data sets. Then they're no longer random and the definition of random becomes circular.
You are missing the big picture. About 1 / 1^(N-1) bit strings of length N will have descriptions shorter than the bit string. The rest will actually require longer descriptions, counting the necessary dictionary. As N approaches infinity, essentially a vanishingly small fraction percentage of the bit strings have short descriptions. This makes sense for a number of reasons, once you start thinking about it. Definition: Tim's Magic Number: 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 0 1 01 1 0 0 So I present you with the number : 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 0 1 01 1 0 0 and ask you to compress it. You look at it, recognize it as Tim's Magic Number, and proudly announce that it has a very concise (short) description: TMN. For why this will in general not work, and how this relates to intelligence tests (multiple good answers to "What is this pattern?" questions), and Berry's Paradox, these are all related to issues of randomness. I'm sorry, but I just can't write a chapter of a book on randomness, laying out all the basics. Even a tutorial on randomness has no shorter description than itself. There are a bunch of Web pages. Keywords being: randomness, bit string complexity, algorithmic information theory, Kolmogorov, Solomonoff, Chaitin, etc. There's a recent book devoted entirely to an introduction to randomness. Called "Randomness." John Casti, Rudy Rucker, and John Barrow have all written good popular accounts. I can't remember if John Pierce's seminal book "Symbols, Signals and Noise" goes into this, but I expect it does. --Tim May "If I'm going to reach out to the the Democrats then I need a third hand.There's no way I'm letting go of my wallet or my gun while they're around." --attribution uncertain, possibly Gunner, on Usenet