Michael Motyka wrote:
What exactly is random data? Does it have to appear to be random?
Does it have to pass some set of statistical tests to be random? If a string or bits from a radiation source spells "Merry Christmas and a Happy New Year" in ASCII is it non-random - a message from above?
Randomness is generally defined for sequences, which are mappings from the non-negative integers onto some finite set of symbols. A sequence is random if its values are independently selected and uniformly distributed. A random sequence of ASCII characters will not only spell out "Merry Christmas and a Happy New Year" numerous times, but will spell out every other possible piece of text as well. In a random sequence of bits, all 2^N N-bit strings are equally likely to occur as we step through the sequence, for any chosen value of N. The Theory of Compression works with sequences. Finite sets of data, or strings, are considered to be initial segments of sequences. There is also the notion of the Kolmogorov complexity of a string, which is the length of the shortest program which generates it. This also leads to the Kolmogorov definition of a random string, as one whose shortest description is itself. Most practical compression algorithms, however, do not work with the complexity of data in the Kolmogorov sense, but merely compress a sequence using a fixed amount of its previous history as a dictionary. As a simple example of how a good compression algorithm might work, consider a person sitting in front of a console that displays the first N-1 characters of some text. The person tries to guess character N and makes successive guesses until the computer tells him he is successful. If we entropy code the number of guesses at each point, and we presume different persons use the same deterministic algorithm for guessing based on a knowlege of the English language, and on what has previously occurred in the text, we can transform any English text of more than trivial length into a lesser number of bits than are required to represent it in ASCII. If we imagine such a scheme applied to the random binary sequence described earlier, the person will on the average require 1 guess 50% of the time, and 2 guesses the other 50% of the time to correctly identify the next bit. Entropy coding two symbols which occur with equal frequency requires one bit for each occurrence. In this case, attempting "compression" requires one bit output for each bit input, demonstrating that a random sequence cannot be compressed. Talking about a finite string being "random" in the commonly accepted non-Kolmogorov sense is a meaningless notion, apart from the string being the initial segment of some random seqence. Any random string is produced by taking a finite number of instances of some random variable, and such a random variable has, by definition, an infinite sequence of such instances over which its mathematical properties are defined. -- Eric Michael Cordian 0+ O:.T:.O:. Mathematical Munitions Division "Do What Thou Wilt Shall Be The Whole Of The Law"