(Will whomever prepending this "Re: *** GMX Spamverdacht ***" header please STOP.) On Monday, August 18, 2003, at 09:01 AM, Dave Howe wrote:
randomness is a funny thing. a truely random file can be *anything* that is the right length - including a excerpt from william shakespere (provided you have enough monkeys) it is most likely to be random garbage, for a large sample - but for a very small sample the odds of it being an ordered sample are surprisingly good.
the obvious example here is a double coin test - two bits. an ordered sample would be both heads or both tails. a disordered sample would be one head and one tail. in practice, you would expect to see half the results from a larger trial (say 32 double-throws) with a ordered sample, and half disordered. as you reach three coins, the odds of a completely ordered result decrease (from 2 in 2^2, to 2 in 2^3). for four coins, you still have the same two compressable results. consider: HHHH HHHT HHTH HHTT HTHH HTHT HTTH HTTT THHH THHT THTH THTT TTHH TTHT TTTH TTTT
in a large trial, you would expect to see each of these once every 2^4 tests, on the average. obviously HHHH and TTTT are very compressable. assuming a runlength encoding, I don't think any of the others are compressable at all...."We should not march into Baghdad. To occupy Iraq would instantly shatter our coalition, turning the whole Arab world against us and make a broken tyrant into a latter- day Arab hero. Assigning young soldiers to a fruitless hunt for a securely entrenched dictator and condemning
Quibble: only surprising if one misunderstands probability. (Not saying you do, just quibbling with any claim that readily calculated probabilities can be "surprising.") them to fight in what would be an unwinable urban guerilla war, it could only plunge that part of the world into ever greater instability." --George H. W. Bush, "A World Transformed", 1998 For any sequence of n fair tosses, the "length after compression" of the outcome is roughly, modulo some constants and factor, (1 - 2^(-n)), where "1" is the uncompressed length. In other words, as n gets large nearly all strings have a "length after compression" that is close to the original length, i.e., little compression. As n gets arbitrarily large, an arbitrarily small fraction of strings have a short, concise description (are "compressed"). --Tim May