Re: Wolfram on randomness and RNGs
Background Stephen Wolfram's book, "A New Kind of Science," is nothing if not interesting. This encyclopedia-sized volume traces how his fascination with cellular automata, beginning in the 1970s, led him to spend decades exploring the significance of complexity created from simple rules.
I hope the following will not be too wordy and generate interest in the cryptographic implications of his work.
Intrinsic Generation of Randomness In the chapter "Mechanisms and Programs in Nature," pp 297 - 361, he presents his case that behavioral similarities between certain simple programs and systems in nature are no coincidence but reflect a deep correspondence. In this section he explores three mechanisms for randomness: external input (noise) captured in so-called stochastic models, those related to initial conditions (e.g., chaos theory), and those based on the behavior of simple programs described in the book and which believes are the most common in nature.
Under the section "The Intrinsic Generation of Randomness" he presents evidence for his third mechanism in which no random input from the outside is needed, and in which the randomness is instead generated inside the systems themselves.
"When one says that something seems random, what one usually means in practice is that one cannot see any regularities in it. So when we say
a particular phenomenon in nature seems random, what we mean is that none of our standard methods of analysis have succeeded in finding regularities in it. To assess the randomness of a sequence produced by something like a cellular automaton, therefore, what we must do is to apply to it the same methods of analysis as we do to natural systems"
... some of these methods have been well codified in standard mathematics and statistics, while others are effectively implicit in our processes of visual and other perception. But the remarkable fact is that none of these methods seem to reveal any real regularities whatsoever in the rule 30 cellular automaton sequence. And thus, so far as one can tell, this sequence is at least as random as anything we see in nature.
But is it truly random?
Over the past century or so, a variety of definitions of true randomness have been proposed. And according to most of these definitions, the sequence is indeed truly random. But there are a certain class of definitions which do not consider it truly random.
For these definitions are based on the notion of classifying as truly random only sequences which can never be generated by any simple procedure whatsoever. Yet starting with a simple initial condition and then applying a simple cellular automaton rule constitutes a simple procedure. And as a result, the center column of rule 30 cannot be considered truly random according to such definitions.
But while definitions of this type have a certain conceptual appeal, they are not likely to be useful in discussions of randomness in nature. For as we will see later in this book, it is almost certainly impossible for any natural process ever to generate a sequence which is guaranteed to be
random according to such definitions.
For our purposes more useful definitions tend to concentrate not so much on whether there exists in principle a simple way to generate a particular sequence, but rather on whether such a way can realistically be recognized by applying various kinds of analysis to the sequence. And as discussed above, there is good evidence that the center column of rule 30 is indeed random according to all reasonable definitions of this kind.
So whether or not one chooses to say that the sequence is truly random, it is, as far as one can tell, at least random for all practical purposes. And in fact sequences closely related to it have been used very successfully as sources of randomness in practical computing.
For many years, most kinds of computer systems and languages have had facilities for generating what they usually call random numbers. And in Mathematica ever since it was first released Random [integer] has generated 0's and 1's using exactly the rule 30 cellular automaton.
The way this works is that every time Random [Integer] is called, another step in the cellular automaton evolution is performed, and the value of
cell in the center is returned. But one difference from the picture two pages ago is that for practical reasons the pattern is not allowed to grow wider and wider forever. Instead, it is wrapped around in a region that is a few hundred cells wide.
One consequence of this, as discussed on page 259, is that the sequence of 0's and 1's that is generated must then eventually repeat. But even with the fastest foreseeable computers, the actual period of repetition will typically be more than a billion billion times the age of the universe.
Another issue is that if one always ran the cellular automaton from page 315 with the particular initial condition shown there, then one would always get exactly the same sequence of 0's and 1's. But by using different initial conditions one can get completely different sequences. And in practice if the initial conditions are not explicitly specified, what Mathematica does, for example, is to use as an initial condition a representation of various features of the exact state of the computer system at the time when Random was first called.
The rule 30 cellular automaton provides a particularly clear and good example of intrinsic randomness generation. But in previous chapters we have seen many other examples of systems that also intrinsically produce apparent randomness. And it turns out that one of these is related to the method used since the late 1940s for generating random numbers in almost all practical computer systems.
The pictures on the facing page show what happens if one successively multiplies a number by various constant factors, and then looks at the digit sequences of the numbers that result. As we first saw on page 119, the patterns of digits obtained in this way seem quite random. And the idea of so-called linear congruential random number generators is precisely to make use of this randomness.
For practical reasons, such generators typically keep only, say, the rightmost 31 digits in the numbers at each step. Yet even with this restriction, the sequences generated are random enough that at least until recently they were almost universally what was used as a source of randomness in practical computing.
So in a sense linear congruential generators are another example of the general phenomenon of intrinsic randomness generation. But it turns out that in some respects they are rather unusual and misleading.
Keeping only a limited number of digits at each step makes it inevitable that the sequences produced will eventually repeat. And one of the reasons for the popularity of linear congruential generators is that with fairly straightforward mathematical analysis it is possible to tell exactly what multiplication factors will maximize this repetition period.
It has then often been assumed that having maximal repetition period will somehow imply maximum randomness in all aspects of the sequence one gets. But in practice over the years, one after another linear congruential generator that has been constructed to have maximal repetition period has turned out to exhibit very substantial deviations from perfect randomness.
A typical kind of failure, illustrated in the pictures on the next page, is that points with coordinates determined by successive numbers from the generator turn out to be distributed in an embarrassingly regular way. At first, such failures might suggest that more complicated schemes must be needed if one is to get good randomness. And indeed with this thought in mind all sorts of elaborate combinations of linear congruential and other generators have been proposed. But although some aspects of the behavior of such systems can be made quite random, deviations from perfect randomness are still often found.
And seeing this one might conclude that it must be essentially impossible to produce good randomness with any kind of system that has reasonably simple rules. But the rule 30 cellular automaton that we discussed above demonstrates that in fact this is absolutely not the case. Indeed, the rules for this cellular automaton are in some respects much simpler than for even a rather basic linear congruential generator. Yet the sequences it produces seem perfectly random, and do not suffer from any of the problems that are typically found in linear congruential generators.
So why do linear congruential generators not produce better randomness? Ironically, the basic reason is also the reason for their popularity. The point is that unlike the rule 30 cellular automaton that we discussed above, linear congruential generators are readily amenable to detailed mathematical analysis. And as a result, it is possible for example to guarantee that a particular generator will indeed have a maximal repetition period.
Almost inevitably, however, having such a maximal period implies a certain regularity. And in fact, as we shall see later in this book, the very possibility of any detailed mathematical analysis tends to imply the presence of at least some deviations from perfect randomness.
But if one is not constrained by the need for such analysis, then as we saw in the cellular automaton example above, remarkably simple rules can successfully generate highly random behavior.
And indeed the existence of such simple rules is crucial in making it plausible that the general mechanism of intrinsic randomness generation can be widespread in nature. For if the only way for intrinsic randomness generation to occur was through very complicated sets of rules, then one would expect that this mechanism would be seen in practice only in a few very special cases.
But the fact that simple cellular automaton rules are sufficient to give rise to intrinsic randomness generation suggests that in reality it is rather easy for this mechanism to occur. And as a result, one can expect that the mechanism will be found often in nature.
So how does the occurrence of this mechanism compare to the previous two mechanisms for randomness that we have discussed?
The basic answer, I believe, is that whenever a large amount of randomness is produced in a short time, intrinsic randomness generation is overwhelmingly likely to be the mechanism responsible.
We saw in the previous section that random details of the initial conditions for a system can lead to a certain amount of randomness in the behavior of a system. But as we discussed, there is in most practical situations a limit on the lengths of sequences whose randomness can realistically be attributed to such a mechanism. With intrinsic randomness generation, however, there is no such limit: in the cellular automaton above, for example, all one need do to get a longer random sequence is to run the cellular automaton for more steps.
But it is also possible to get long random sequences by continual interaction with a random external environment, as in the first mechanism for randomness discussed in this chapter.
The issue with this mechanism, however, is that it can take a long time to get a given amount of goodquality randomness from it. And the point is
in most cases, intrinsic randomness generation can produce similar randomness in a much shorter time.
Indeed, in general, intrinsic randomness generation tends to be much more efficient than getting randomness from the environment. The basic reason is that intrinsic randomness generation in a sense puts all the components in a system to work in producing new randomness, while getting randomness from the environment does not.
Thus, for example, in the rule 30 cellular automaton discussed above, every cell in effect actively contributes to the randomness we see. But in a system that just amplifies randomness from the environment, none of the components inside the system itself ever contribute any new randomness at all. Indeed, ironically enough, the more components that are involved in the process of amplification, the slower it will typically be to get each new piece of random output. For as we discussed two sections ago, each component in a sense adds what one can consider to be more inertia to the amplification process.
But with a larger number of components it becomes progressively easier for randomness to be generated through intrinsic randomness generation. And indeed unless the underlying rules for the system somehow explicitly prevent it, it turns out in the end that intrinsic randomness generation will almost inevitably occuroften producing so much randomness that it completely swamps any randomness that might be produced from either of the other two mechanisms.
Yet having said this, one can ask how one can tell in an actual experiment on some particular system in nature to what extent intrinsic randomness generation is really tie mechanism responsible for whatever seemingly random behavior one observed.
The clearest sign is a somewhat unexpected phenomenon: that details of the random behavior can be repeatable from one run of the experiment to another. It is not surprising that general features of the behavior will be the same. But what is remarkable is that if intrinsic randomness generation is the mechanism at work, then the precise details of the behavior can also be repeatable.
In the mechanism where randomness comes from continual interaction with
It would seem that while the bitstream generated by the center column of rule 30 might be a good random number source, its repeatability is the very thing that detracts from its usefulness in cryptographic application. An obviously poor application would be to have a "one time pad" where two parties would xor their plaintext with the bitstream produced by rule 30, starting at the top. While the resulting bitstream would appear random, an attacker with knowledge of the algorithm could just run rule 30 themselves and decode the result. To have cryptographically strong random numbers, one needs to have an *unreproducable* source of randomness -- the very thing that Wolfram seems to sneer at as being purely academic but that the above methodology makes clear. While a slightly modified approach of having both sides start at a secret row of rule 30 could be used, the key is now merely the row number; defeating the purpose. One interesting possibility might be to "seed" a wide row of rule 30 with bits gleamed from the environment; this would make it difficult to reproduce the bitstream without the bits representing the initial conditions, but without continuing to add bits to rows, the "bit strength" of the randomness is only the width of the seeded row (namely, if you're using 8 bits of randomness to seed rule 30, an attacker could brute force the 256 possibilities to find your random bitstream). The problem is, IMHO, exactly analogous to deriving randomness from irrational numbers, such as the digits of pi, e, or the square root of two; this just might be a slightly more efficient way to generate the bitstream. The point is, they're all very good sources of randomness, but the fact that their sequences are so well-defined keeps them from being a good source of secrecy; picking out which portions of the sequence to use end up becoming your secret and your sequence is truly only as unpredictable as this secret. In another sense, the sequence you're using is only as strong as its inputs. Just my $0.02; please bitchslap me if I got this wrong. David E. Weekly Founder & Executive Director California Community Colocation Project (an OPG project) http://CommunityColo.net/ - the world's first non-profit colo! ----- Original Message ----- From: "Steve Schear" <schear@lvcm.com> To: <cypherpunks@lne.com> Sent: Friday, September 06, 2002 1:57 PM Subject: Wolfram on randomness and RNGs that truly the that the
environment, no repeatability can be expected. For every time the experiment is run, the state of the environment will be different, and so the behavior one sees will also be correspondingly different. And similarly, in the mechanism where randomness comes from the details of initial conditions, there will again be little, if any, repeatability. For the details of the initial conditions are typically affected by the environment of the system, and cannot realistically be kept the same from one run to another.
But the point is that with the mechanism of intrinsic randomness generation, there is no dependence on the environment. And as a result, so long as the setup of the system one is looking at remains the same, the behavior it produces will be exactly the same. Thus for example, however many times one runs a rule 30 cellular automaton, starting with a single black cell, the behavior one gets will always be exactly the same. And so for example the sequence of colors of the center cell, while seemingly random, will also be exactly the same."
participants (1)
-
David E. Weekly