Re: Random Data Compressed 100:1 (Guffaw)

9 Jan 2002

      On Wednesday, January 9, 2002, at 09:24 AM, Michael Motyka wrote:
...
Ken Brown <k.brown@ccs.bbk.ac.uk> wrote :
...
Michael Motyka wrote:
...
Here we go :
"As Eric correctly points out, true random input simply cannot be
compressed, it doesn't matter how clever you are or how much time and
computing power you have access to."
This is a statement of belief isn't it? Odd.
No, it's a logical consequence of his definition of "random" & no 
more a
statement of belief than "1+1=2" is. If you use the word to mean
something else then it might or might not be true in your universe.
Ken
Perhaps my universe is off in all six degrees but answer this - which is
it :
random data cannot be compressed
First, there is no such thing as "a set of random data" (which is what I 
take you to mean by "random data").

There are operational definitions of "a set of data generated by a 
process which appears to be random." Clicks of a Geiger counter, thermal 
noise from a reverse-biased diode, etc.

(If my words "appear to be generated" are weaselly, it's the nature of 
the beast. It is generally not possible to either prove that a source 
_is_ random or that a set of data are random.)

Of strings which we would all call random, of length N, the majority of 
them have no description shorter than themselves.
...
or
random data may include recognizable and even repeating patterns
Of course any particular string will have clumps and small patterns, etc.

This doesn't make them compressible, however. Run-length encoding on a 
"random" data set or string doesn't yield any compression, on average. 
(For a different slant on why this is so, intuitively, it's related to 
why strategies for "doubling up on winning bets" fail. Theory of 
martingales.)
...
A string of random data may include anything. Compressibility will vary
with the particular data and the algorithm used. Lucky with one input
set, unlucky with another. I wouldn't try to base a business on getting
lucky with the customer's input data but the absolute statement of
incompressibility is wrong unless you preselect your data sets. Then
they're no longer random and the definition of random becomes circular.
You are missing the big picture. About 1 / 1^(N-1) bit strings of length 
N will have descriptions shorter than the bit string. The rest will 
actually require longer descriptions, counting the necessary dictionary. 
As N approaches infinity, essentially a vanishingly small fraction 
percentage of the bit strings have short descriptions. This makes sense 
for a number of reasons, once you start thinking about it.

Definition: Tim's Magic Number:  1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 0 0 1 0 
0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 0 1 01 1 0 0

So I present you with the number : 1 0 1 1 1 0 1 1 0 0 1 0 1 0 1 0 0 0 1 
0 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 0 1 01 1 0 0

and ask you to compress it. You look at it, recognize it as Tim's Magic 
Number, and proudly announce that it has a very concise (short) 
description: TMN.

For why this will in general not work, and how this relates to 
intelligence tests (multiple good answers to "What is this pattern?" 
questions), and Berry's Paradox, these are all related to issues of 
randomness.

I'm sorry, but I just can't write a chapter of a book on randomness, 
laying out all the basics.

Even a tutorial on randomness has no shorter description than itself.

There are a bunch of Web pages. Keywords being: randomness, bit string 
complexity, algorithmic information theory, Kolmogorov, Solomonoff, 
Chaitin, etc.

There's a recent book devoted entirely to an introduction to randomness. 
Called "Randomness."

John Casti, Rudy Rucker, and John Barrow have all written good popular 
accounts. I can't remember if John Pierce's seminal book "Symbols, 
Signals and Noise" goes into this, but I expect it does.

--Tim May
"If I'm going to reach out to the the Democrats then I need a third 
hand.There's no way I'm letting go of my wallet or my gun while they're 
around." --attribution uncertain, possibly Gunner, on Usenet