Wolfram on randomness and RNGs

Fri Sep 6 17:21:50 PDT 2002

It would seem that while the bitstream generated by the center column of
rule 30 might be a good random number source, its repeatability is the very
thing that detracts from its usefulness in cryptographic application. An
obviously poor application would be to have a "one time pad" where two
parties would xor their plaintext with the bitstream produced by rule 30,
starting at the top. While the resulting bitstream would appear random, an
attacker with knowledge of the algorithm could just run rule 30 themselves
and decode the result. To have cryptographically strong random numbers, one
needs to have an *unreproducable* source of randomness -- the very thing
that Wolfram seems to sneer at as being purely academic but that the above
methodology makes clear. While a slightly modified approach of having both
sides start at a secret row of rule 30 could be used, the key is now merely
the row number; defeating the purpose.

One interesting possibility might be to "seed" a wide row of rule 30 with
bits gleamed from the environment; this would make it difficult to reproduce
the bitstream without the bits representing the initial conditions, but
without continuing to add bits to rows, the "bit strength" of the randomness
is only the width of the seeded row (namely, if you're using 8 bits of
randomness to seed rule 30, an attacker could brute force the 256
possibilities to find your random bitstream).

The problem is, IMHO, exactly analogous to deriving randomness from
irrational numbers, such as the digits of pi, e, or the square root of two;
this just might be a slightly more efficient way to generate the bitstream.
The point is, they're all very good sources of randomness, but the fact that
their sequences are so well-defined keeps them from being a good source of
secrecy; picking out which portions of the sequence to use end up becoming
your secret and your sequence is truly only as unpredictable as this secret.

In another sense, the sequence you're using is only as strong as its inputs.

Just my $0.02; please bitchslap me if I got this wrong.

 David E. Weekly
 Founder & Executive Director
 California Community Colocation Project (an OPG project)
 http://CommunityColo.net/ - the world's first non-profit colo!

----- Original Message -----
From: "Steve Schear" <schear at lvcm.com>
To: <cypherpunks at lne.com>
Sent: Friday, September 06, 2002 1:57 PM
Subject: Wolfram on randomness and RNGs

> Background
> Stephen Wolfram's book, "A New Kind of Science," is nothing if not
> interesting.  This encyclopedia-sized volume traces how his fascination
> with cellular automata, beginning in the 1970s, led him to spend decades
> exploring the significance of complexity created from simple rules.
>
> I hope the following will not be too wordy and generate interest in the
> cryptographic implications of his work.
>
> Intrinsic Generation of Randomness
> In the chapter "Mechanisms and Programs in Nature," pp 297 - 361, he
> presents his case that behavioral similarities between certain simple
> programs and systems in nature are no coincidence but reflect a deep
> correspondence.  In this section he explores three mechanisms for
> randomness: external input (noise) captured in so-called stochastic
models,
> those related to initial conditions (e.g., chaos theory), and those based
> on the behavior of simple programs described in the book and which
believes
> are the most common in nature.
>
> Under the section "The Intrinsic Generation of Randomness" he presents
> evidence for his third mechanism in which no random input from the outside
> is needed, and in which the randomness is instead generated inside the
> systems themselves.
>
> "When one says that something seems random, what one usually means in
> practice is that one cannot see any regularities in it. So when we say
that
> a particular phenomenon in nature seems random, what we mean is that none
> of our standard methods of analysis have succeeded in finding regularities
> in it. To assess the randomness of a sequence produced by something like a
> cellular automaton, therefore, what we must do is to apply to it the same
> methods of analysis as we do to natural systems"
>
> ... some of these methods have been well codified in standard mathematics
> and statistics, while others are effectively implicit in our processes of
> visual and other perception. But the remarkable fact is that none of these
> methods seem to reveal any real regularities whatsoever in the rule 30
> cellular automaton sequence. And thus, so far as one can tell, this
> sequence is at least as random as anything we see in nature.
>
> But is it truly random?
>
> Over the past century or so, a variety of definitions of true randomness
> have been proposed. And according to most of these definitions, the
> sequence is indeed truly random. But there are a certain class of
> definitions which do not consider it truly random.
>
> For these definitions are based on the notion of classifying as truly
> random only sequences which can never be generated by any simple procedure
> whatsoever. Yet starting with a simple initial condition and then applying
> a simple cellular automaton rule constitutes a simple procedure. And as a
> result, the center column of rule 30 cannot be considered truly random
> according to such definitions.
>
> But while definitions of this type have a certain conceptual appeal, they
> are not likely to be useful in discussions of randomness in nature. For as
> we will see later in this book, it is almost certainly impossible for any
> natural process ever to generate a sequence which is guaranteed to be
truly
> random according to such definitions.
>
> For our purposes more useful definitions tend to concentrate not so much
on
> whether there exists in principle a simple way to generate a particular
> sequence, but rather on whether such a way can realistically be recognized
> by applying various kinds of analysis to the sequence. And as discussed
> above, there is good evidence that the center column of rule 30 is indeed
> random according to all reasonable definitions of this kind.
>
> So whether or not one chooses to say that the sequence is truly random, it
> is, as far as one can tell, at least random for all practical purposes.
And
> in fact sequences closely related to it have been used very successfully
as
> sources of randomness in practical computing.
>
> For many years, most kinds of computer systems and languages have had
> facilities for generating what they usually call random numbers. And in
> Mathematica  ever since it was first released Random [integer] has
> generated 0's and 1's using exactly the rule 30 cellular automaton.
>
> The way this works is that every time Random [Integer] is called, another
> step in the cellular automaton evolution is performed, and the value of
the
> cell in the center is returned. But one difference from the picture two
> pages ago is that for practical reasons the pattern is not allowed to grow
> wider and wider forever. Instead, it is wrapped around in a region that is
> a few hundred cells wide.
>
> One consequence of this, as discussed on page 259, is that the sequence of
> 0's and 1's that is generated must then eventually repeat. But even with
> the fastest foreseeable computers, the actual period of repetition will
> typically be more than a billion billion times the age of the universe.
>
> Another issue is that if one always ran the cellular automaton from page
> 315 with the particular initial condition shown there, then one would
> always get exactly the same sequence of 0's and 1's. But by using
different
> initial conditions one can get completely different sequences. And in
> practice if the initial conditions are not explicitly specified, what
> Mathematica does, for example, is to use as an initial condition a
> representation of various features of the exact state of the computer
> system at the time when Random was first called.
>
> The rule 30 cellular automaton provides a particularly clear and good
> example of intrinsic randomness generation. But in previous chapters we
> have seen many other examples of systems that also intrinsically produce
> apparent randomness. And it turns out that one of these is related to the
> method used since the late 1940s for generating random numbers in almost
> all practical computer systems.
>
> The pictures on the facing page show what happens if one successively
> multiplies a number by various constant factors, and then looks at the
> digit sequences of the numbers that result. As we first saw on page 119,
> the patterns of digits obtained in this way seem quite random. And the
idea
> of so-called linear congruential random number generators is precisely to
> make use of this randomness.
>
> For practical reasons, such generators typically keep only, say, the
> rightmost 31 digits in the numbers at each step. Yet even with this
> restriction, the sequences generated are random enough that at least until
> recently they were almost universally what was used as a source of
> randomness in practical computing.
>
> So in a sense linear congruential generators are another example of the
> general phenomenon of intrinsic randomness generation. But it turns out
> that in some respects they are rather unusual and misleading.
>
> Keeping only a limited number of digits at each step makes it inevitable
> that the sequences produced will eventually repeat. And one of the reasons
> for the popularity of linear congruential generators is that with fairly
> straightforward mathematical analysis it is possible to tell exactly what
> multiplication factors will maximize this repetition period.
>
> It has then often been assumed that having maximal repetition period will
> somehow imply maximum randomness in all aspects of the sequence one gets.
> But in practice over the years, one after another linear congruential
> generator that has been constructed to have maximal repetition period has
> turned out to exhibit very substantial deviations from perfect randomness.
>
> A typical kind of failure, illustrated in the pictures on the next page,
is
> that points with coordinates determined by successive numbers from the
> generator turn out to be distributed in an embarrassingly regular way. At
> first, such failures might suggest that more complicated schemes must be
> needed if one is to get good randomness. And indeed with this thought in
> mind all sorts of elaborate combinations of linear congruential and other
> generators have been proposed. But although some aspects of the behavior
of
> such systems can be made quite random, deviations from perfect randomness
> are still often found.
>
> And seeing this one might conclude that it must be essentially impossible
> to produce good randomness with any kind of system that has reasonably
> simple rules. But the rule 30 cellular automaton that we discussed above
> demonstrates that in fact this is absolutely not the case. Indeed, the
> rules for this cellular automaton are in some respects much simpler than
> for even a rather basic linear congruential generator. Yet the sequences
it
> produces seem perfectly random, and do not suffer from any of the problems
> that are typically found in linear congruential generators.
>
> So why do linear congruential generators not produce better randomness?
> Ironically, the basic reason is also the reason for their popularity. The
> point is that unlike the rule 30 cellular automaton that we discussed
> above, linear congruential generators are readily amenable to detailed
> mathematical analysis. And as a result, it is possible for example to
> guarantee that a particular generator will indeed have a maximal
repetition
> period.
>
> Almost inevitably, however, having such a maximal period implies a certain
> regularity. And in fact, as we shall see later in this book, the very
> possibility of any detailed mathematical analysis tends to imply the
> presence of at least some deviations from perfect randomness.
>
> But if one is not constrained by the need for such analysis, then as we
saw
> in the cellular automaton example above, remarkably simple rules can
> successfully generate highly random behavior.
>
> And indeed the existence of such simple rules is crucial in making it
> plausible that the general mechanism of intrinsic randomness generation
can
> be widespread in nature. For if the only way for intrinsic randomness
> generation to occur was through very complicated sets of rules, then one
> would expect that this mechanism would be seen in practice only in a few
> very special cases.
>
> But the fact that simple cellular automaton rules are sufficient to give
> rise to intrinsic randomness generation suggests that in reality it is
> rather easy for this mechanism to occur. And as a result, one can expect
> that the mechanism will be found often in nature.
>
> So how does the occurrence of this mechanism compare to the previous two
> mechanisms for randomness that we have discussed?
>
> The basic answer, I believe, is that whenever a large amount of randomness
> is produced in a short time, intrinsic randomness generation is
> overwhelmingly likely to be the mechanism responsible.
>
> We saw in the previous section that random details of the initial
> conditions for a system can lead to a certain amount of randomness in the
> behavior of a system. But as we discussed, there is in most practical
> situations a limit on the lengths of sequences whose randomness can
> realistically be attributed to such a mechanism. With intrinsic randomness
> generation, however, there is no such limit: in the cellular automaton
> above, for example, all one need do to get a longer random sequence is to
> run the cellular automaton for more steps.
>
> But it is also possible to get long random sequences by continual
> interaction with a random external environment, as in the first mechanism
> for randomness discussed in this chapter.
>
> The issue with this mechanism, however, is that it can take a long time to
> get a given amount of goodquality randomness from it. And the point is
that
> in most cases, intrinsic randomness generation can produce similar
> randomness in a much shorter time.
>
> Indeed, in general, intrinsic randomness generation tends to be much more
> efficient than getting randomness from the environment. The basic reason
is
> that intrinsic randomness generation in a sense puts all the components in
> a system to work in producing new randomness, while getting randomness
from
> the environment does not.
>
> Thus, for example, in the rule 30 cellular automaton discussed above,
every
> cell in effect actively contributes to the randomness we see. But in a
> system that just amplifies randomness from the environment, none of the
> components inside the system itself ever contribute any new randomness at
> all. Indeed, ironically enough, the more components that are involved in
> the process of amplification, the slower it will typically be to get each
> new piece of random output. For as we discussed two sections ago, each
> component in a sense adds what one can consider to be more inertia to the
> amplification process.
>
> But with a larger number of components it becomes progressively easier for
> randomness to be generated through intrinsic randomness generation. And
> indeed unless the underlying rules for the system somehow explicitly
> prevent it, it turns out in the end that intrinsic randomness generation
> will almost inevitably occuroften producing so much randomness that it
> completely swamps any randomness that might be produced from either of the
> other two mechanisms.
>
> Yet having said this, one can ask how one can tell in an actual experiment
> on some particular system in nature to what extent intrinsic randomness
> generation is really tie mechanism responsible for whatever seemingly
> random behavior one observed.
>
> The clearest sign is a somewhat unexpected phenomenon: that details of the
> random behavior can be repeatable from one run of the experiment to
> another. It is not surprising that general features of the behavior will
be
> the same. But what is remarkable is that if intrinsic randomness
generation
> is the mechanism at work, then the precise details of the behavior can
also
> be repeatable.
>
> In the mechanism where randomness comes from continual interaction with
the
> environment, no repeatability can be expected. For every time the
> experiment is run, the state of the environment will be different, and so
> the behavior one sees will also be correspondingly different. And
> similarly, in the mechanism where randomness comes from the details of
> initial conditions, there will again be little, if any, repeatability. For
> the details of the initial conditions are typically affected by the
> environment of the system, and cannot realistically be kept the same from
> one run to another.
>
> But the point is that with the mechanism of intrinsic randomness
> generation, there is no dependence on the environment. And as a result, so
> long as the setup of the system one is looking at remains the same, the
> behavior it produces will be exactly the same. Thus for example, however
> many times one runs a rule 30 cellular automaton, starting with a single
> black cell, the behavior one gets will always be exactly the same. And so
> for example the sequence of colors of the center cell, while seemingly
> random, will also be exactly the same."