Random Number Testing

JonWienke@aol.com writes:
The "randomness" of the output of my sieve is obviously not perfect. However, it will remove bytes from a data stream that "randomness" tests describe as "nonrandom".
And Perry writes:
You are displaying a profound ignorance of the notion of "random" here. An individual byte cannot be random or nonrandom. Random is a statistical property, and therefore can only apply to a mass of data.
The limitation seems to be in the testing process. An inviolable rule of test and measurement is that the testing process must have a higher degree of accuracy and precision than is expected from the item under test. If all you have is a non-graduated stick one meter long, you can't measure increments of less than one meter unless you add some additional resolution to the system (such as a human's judgment that the object is four sticks, plus about another third of a stick, long). Our ability to synthesize pseudo-random data, and the degree to which it approaches true random data, is limited by our ability to discern between the two. If the best testing process available yields the same results for the output of a true RNG and a particular PRNG, what can be done to improve the randomness of the PRNG? How could we tell the difference based on the data itself? About all you could do would be to critique the PRNG or sieving process from an empirical perspective and hope that you weren't introducing some other non-random bias to the system. I'm sure that some perfectly random data would be discarded as being non-random, especially for a small sample size. The sequence HHHHTTTT doesn't look random, but this outcome of 8 flips is equally probable for a "perfect" coin. Do you keep this data or toss it because it doesn't "look" random? If you toss it, aren't you removing entropy from the sample? It may be (relatively) easy to say that radioactive decay is a truly random process and a sieve which discards certain re-occuring values is not. Improving the testing process, and increasing the size of the data used for that testing, will improve the sieve output. However, it looks like an asymptotic situation: the PRNG can approach, but never quite reach, the true RNG case. As long as human analysis and discretion is involved, there will always be a certain amount of non-randomness in the output of a PRNG. My question: if you don't know the source of the data, how can you _really_ determine if it's random enough for crypto use? Given any PRNG string of a certain length, does the fact that I might be able to find that exact same data string buried in a mountain of truly random data mean that it's suitable for crypto use? If you knew that I created the data from the clock of my workstation, then maybe not. But otherwise? Fred <admin@dcwill.com>
participants (1)
-
Fred