Re: Wolfram on randomness and RNGs

7 Sep 2002

      ...
Background
Stephen Wolfram's book, "A New Kind of Science," is nothing if not
interesting.  This encyclopedia-sized volume traces how his fascination
with cellular automata, beginning in the 1970s, led him to spend decades
exploring the significance of complexity created from simple rules.
I hope the following will not be too wordy and generate interest in the
cryptographic implications of his work.
Intrinsic Generation of Randomness
In the chapter "Mechanisms and Programs in Nature," pp 297 - 361, he
presents his case that behavioral similarities between certain simple
programs and systems in nature are no coincidence but reflect a deep
correspondence.  In this section he explores three mechanisms for
randomness: external input (noise) captured in so-called stochastic
models,
those related to initial conditions (e.g., chaos theory), and those based
on the behavior of simple programs described in the book and which
believes
are the most common in nature.
Under the section "The Intrinsic Generation of Randomness" he presents
evidence for his third mechanism in which no random input from the outside
is needed, and in which the randomness is instead generated inside the
systems themselves.
"When one says that something seems random, what one usually means in
practice is that one cannot see any regularities in it. So when we say
...
a particular phenomenon in nature seems random, what we mean is that none
of our standard methods of analysis have succeeded in finding regularities
in it. To assess the randomness of a sequence produced by something like a
cellular automaton, therefore, what we must do is to apply to it the same
methods of analysis as we do to natural systems"
... some of these methods have been well codified in standard mathematics
and statistics, while others are effectively implicit in our processes of
visual and other perception. But the remarkable fact is that none of these
methods seem to reveal any real regularities whatsoever in the rule 30
cellular automaton sequence. And thus, so far as one can tell, this
sequence is at least as random as anything we see in nature.
But is it truly random?
Over the past century or so, a variety of definitions of true randomness
have been proposed. And according to most of these definitions, the
sequence is indeed truly random. But there are a certain class of
definitions which do not consider it truly random.
For these definitions are based on the notion of classifying as truly
random only sequences which can never be generated by any simple procedure
whatsoever. Yet starting with a simple initial condition and then applying
a simple cellular automaton rule constitutes a simple procedure. And as a
result, the center column of rule 30 cannot be considered truly random
according to such definitions.
But while definitions of this type have a certain conceptual appeal, they
are not likely to be useful in discussions of randomness in nature. For as
we will see later in this book, it is almost certainly impossible for any
natural process ever to generate a sequence which is guaranteed to be
...
random according to such definitions.
For our purposes more useful definitions tend to concentrate not so much
on
whether there exists in principle a simple way to generate a particular
sequence, but rather on whether such a way can realistically be recognized
by applying various kinds of analysis to the sequence. And as discussed
above, there is good evidence that the center column of rule 30 is indeed
random according to all reasonable definitions of this kind.
So whether or not one chooses to say that the sequence is truly random, it
is, as far as one can tell, at least random for all practical purposes.
And
in fact sequences closely related to it have been used very successfully
as
sources of randomness in practical computing.
For many years, most kinds of computer systems and languages have had
facilities for generating what they usually call random numbers. And in
Mathematica  ever since it was first released Random [integer] has
generated 0's and 1's using exactly the rule 30 cellular automaton.
The way this works is that every time Random [Integer] is called, another
step in the cellular automaton evolution is performed, and the value of
...
cell in the center is returned. But one difference from the picture two
pages ago is that for practical reasons the pattern is not allowed to grow
wider and wider forever. Instead, it is wrapped around in a region that is
a few hundred cells wide.
One consequence of this, as discussed on page 259, is that the sequence of
0's and 1's that is generated must then eventually repeat. But even with
the fastest foreseeable computers, the actual period of repetition will
typically be more than a billion billion times the age of the universe.
Another issue is that if one always ran the cellular automaton from page
315 with the particular initial condition shown there, then one would
always get exactly the same sequence of 0's and 1's. But by using
different
initial conditions one can get completely different sequences. And in
practice if the initial conditions are not explicitly specified, what
Mathematica does, for example, is to use as an initial condition a
representation of various features of the exact state of the computer
system at the time when Random was first called.
The rule 30 cellular automaton provides a particularly clear and good
example of intrinsic randomness generation. But in previous chapters we
have seen many other examples of systems that also intrinsically produce
apparent randomness. And it turns out that one of these is related to the
method used since the late 1940s for generating random numbers in almost
all practical computer systems.
The pictures on the facing page show what happens if one successively
multiplies a number by various constant factors, and then looks at the
digit sequences of the numbers that result. As we first saw on page 119,
the patterns of digits obtained in this way seem quite random. And the
idea
of so-called linear congruential random number generators is precisely to
make use of this randomness.
For practical reasons, such generators typically keep only, say, the
rightmost 31 digits in the numbers at each step. Yet even with this
restriction, the sequences generated are random enough that at least until
recently they were almost universally what was used as a source of
randomness in practical computing.
So in a sense linear congruential generators are another example of the
general phenomenon of intrinsic randomness generation. But it turns out
that in some respects they are rather unusual and misleading.
Keeping only a limited number of digits at each step makes it inevitable
that the sequences produced will eventually repeat. And one of the reasons
for the popularity of linear congruential generators is that with fairly
straightforward mathematical analysis it is possible to tell exactly what
multiplication factors will maximize this repetition period.
It has then often been assumed that having maximal repetition period will
somehow imply maximum randomness in all aspects of the sequence one gets.
But in practice over the years, one after another linear congruential
generator that has been constructed to have maximal repetition period has
turned out to exhibit very substantial deviations from perfect randomness.
A typical kind of failure, illustrated in the pictures on the next page,
is
that points with coordinates determined by successive numbers from the
generator turn out to be distributed in an embarrassingly regular way. At
first, such failures might suggest that more complicated schemes must be
needed if one is to get good randomness. And indeed with this thought in
mind all sorts of elaborate combinations of linear congruential and other
generators have been proposed. But although some aspects of the behavior
of
such systems can be made quite random, deviations from perfect randomness
are still often found.
And seeing this one might conclude that it must be essentially impossible
to produce good randomness with any kind of system that has reasonably
simple rules. But the rule 30 cellular automaton that we discussed above
demonstrates that in fact this is absolutely not the case. Indeed, the
rules for this cellular automaton are in some respects much simpler than
for even a rather basic linear congruential generator. Yet the sequences
it
produces seem perfectly random, and do not suffer from any of the problems
that are typically found in linear congruential generators.
So why do linear congruential generators not produce better randomness?
Ironically, the basic reason is also the reason for their popularity. The
point is that unlike the rule 30 cellular automaton that we discussed
above, linear congruential generators are readily amenable to detailed
mathematical analysis. And as a result, it is possible for example to
guarantee that a particular generator will indeed have a maximal
repetition
period.
Almost inevitably, however, having such a maximal period implies a certain
regularity. And in fact, as we shall see later in this book, the very
possibility of any detailed mathematical analysis tends to imply the
presence of at least some deviations from perfect randomness.
But if one is not constrained by the need for such analysis, then as we
saw
in the cellular automaton example above, remarkably simple rules can
successfully generate highly random behavior.
And indeed the existence of such simple rules is crucial in making it
plausible that the general mechanism of intrinsic randomness generation
can
be widespread in nature. For if the only way for intrinsic randomness
generation to occur was through very complicated sets of rules, then one
would expect that this mechanism would be seen in practice only in a few
very special cases.
But the fact that simple cellular automaton rules are sufficient to give
rise to intrinsic randomness generation suggests that in reality it is
rather easy for this mechanism to occur. And as a result, one can expect
that the mechanism will be found often in nature.
So how does the occurrence of this mechanism compare to the previous two
mechanisms for randomness that we have discussed?
The basic answer, I believe, is that whenever a large amount of randomness
is produced in a short time, intrinsic randomness generation is
overwhelmingly likely to be the mechanism responsible.
We saw in the previous section that random details of the initial
conditions for a system can lead to a certain amount of randomness in the
behavior of a system. But as we discussed, there is in most practical
situations a limit on the lengths of sequences whose randomness can
realistically be attributed to such a mechanism. With intrinsic randomness
generation, however, there is no such limit: in the cellular automaton
above, for example, all one need do to get a longer random sequence is to
run the cellular automaton for more steps.
But it is also possible to get long random sequences by continual
interaction with a random external environment, as in the first mechanism
for randomness discussed in this chapter.
The issue with this mechanism, however, is that it can take a long time to
get a given amount of goodquality randomness from it. And the point is
...
in most cases, intrinsic randomness generation can produce similar
randomness in a much shorter time.
Indeed, in general, intrinsic randomness generation tends to be much more
efficient than getting randomness from the environment. The basic reason
is
that intrinsic randomness generation in a sense puts all the components in
a system to work in producing new randomness, while getting randomness
from
the environment does not.
Thus, for example, in the rule 30 cellular automaton discussed above,
every
cell in effect actively contributes to the randomness we see. But in a
system that just amplifies randomness from the environment, none of the
components inside the system itself ever contribute any new randomness at
all. Indeed, ironically enough, the more components that are involved in
the process of amplification, the slower it will typically be to get each
new piece of random output. For as we discussed two sections ago, each
component in a sense adds what one can consider to be more inertia to the
amplification process.
But with a larger number of components it becomes progressively easier for
randomness to be generated through intrinsic randomness generation. And
indeed unless the underlying rules for the system somehow explicitly
prevent it, it turns out in the end that intrinsic randomness generation
will almost inevitably occuroften producing so much randomness that it
completely swamps any randomness that might be produced from either of the
other two mechanisms.
Yet having said this, one can ask how one can tell in an actual experiment
on some particular system in nature to what extent intrinsic randomness
generation is really tie mechanism responsible for whatever seemingly
random behavior one observed.
The clearest sign is a somewhat unexpected phenomenon: that details of the
random behavior can be repeatable from one run of the experiment to
another. It is not surprising that general features of the behavior will
be
the same. But what is remarkable is that if intrinsic randomness
generation
is the mechanism at work, then the precise details of the behavior can
also
be repeatable.
In the mechanism where randomness comes from continual interaction with
It would seem that while the bitstream generated by the center column of
rule 30 might be a good random number source, its repeatability is the very
thing that detracts from its usefulness in cryptographic application. An
obviously poor application would be to have a "one time pad" where two
parties would xor their plaintext with the bitstream produced by rule 30,
starting at the top. While the resulting bitstream would appear random, an
attacker with knowledge of the algorithm could just run rule 30 themselves
and decode the result. To have cryptographically strong random numbers, one
needs to have an *unreproducable* source of randomness -- the very thing
that Wolfram seems to sneer at as being purely academic but that the above
methodology makes clear. While a slightly modified approach of having both
sides start at a secret row of rule 30 could be used, the key is now merely
the row number; defeating the purpose.

One interesting possibility might be to "seed" a wide row of rule 30 with
bits gleamed from the environment; this would make it difficult to reproduce
the bitstream without the bits representing the initial conditions, but
without continuing to add bits to rows, the "bit strength" of the randomness
is only the width of the seeded row (namely, if you're using 8 bits of
randomness to seed rule 30, an attacker could brute force the 256
possibilities to find your random bitstream).

The problem is, IMHO, exactly analogous to deriving randomness from
irrational numbers, such as the digits of pi, e, or the square root of two;
this just might be a slightly more efficient way to generate the bitstream.
The point is, they're all very good sources of randomness, but the fact that
their sequences are so well-defined keeps them from being a good source of
secrecy; picking out which portions of the sequence to use end up becoming
your secret and your sequence is truly only as unpredictable as this secret.

In another sense, the sequence you're using is only as strong as its inputs.

Just my $0.02; please bitchslap me if I got this wrong.

 David E. Weekly
 Founder & Executive Director
 California Community Colocation Project (an OPG project)
 http://CommunityColo.net/ - the world's first non-profit colo!

----- Original Message -----
From: "Steve Schear" <schear@lvcm.com>
To: <cypherpunks@lne.com>
Sent: Friday, September 06, 2002 1:57 PM
Subject: Wolfram on randomness and RNGs

that
truly
the
that
the
...
environment, no repeatability can be expected. For every time the
experiment is run, the state of the environment will be different, and so
the behavior one sees will also be correspondingly different. And
similarly, in the mechanism where randomness comes from the details of
initial conditions, there will again be little, if any, repeatability. For
the details of the initial conditions are typically affected by the
environment of the system, and cannot realistically be kept the same from
one run to another.
But the point is that with the mechanism of intrinsic randomness
generation, there is no dependence on the environment. And as a result, so
long as the setup of the system one is looking at remains the same, the
behavior it produces will be exactly the same. Thus for example, however
many times one runs a rule 30 cellular automaton, starting with a single
black cell, the behavior one gets will always be exactly the same. And so
for example the sequence of colors of the center cell, while seemingly
random, will also be exactly the same."

David E. Weekly

tags

participants (1)