On 9 Jan 2002, at 10:24, Michael Motyka wrote:
Perhaps my universe is off in all six degrees but answer this - which is it :
random data cannot be compressed
or
random data may include recognizable and even repeating patterns
Both. I'll try this one last time and then give up on you. Any algorithm which maps files onto files will be AT LEAST as likely to increase the size of files as to decrease their size if fed random data. Further, the chance of being able to significantly decrease the size of a file derived from a random source with any given algorithm is vanishingly small (since there are 256 tomes as many N byte files as there are N - 1 byte files, the odds of a random file being compressible by a single byte by a given algorithm can't be any better than 1 in 256. The odds of being able to compress it by two bytes ae even worse).
A string of random data may include anything. Compressibility will vary with the particular data and the algorithm used. Lucky with one input set, unlucky with another.
No. Unlucky with virtually all of them. There aren't words to describe how unlikely it is that a 1 megabyte file of random data could be compressed by a factor of 100 by any given algorithm. It's not like winning the lottery. It's not even like winning the lottery 100 times in a row. We're talking vastly improbable.
I wouldn't try to base a business on getting lucky with the customer's input data but the absolute statement of incompressibility is wrong unless you preselect your data sets. Then they're no longer random and the definition of random becomes circular.
Mike
George