[Cryptography] randomness +- entropy

Tue Nov 5 04:18:58 PST 2013

On Nov 4, 2013, at 2:21 PM, John Denker wrote:
> Some people have been throwing around the word "entropy" 
> rather carelessly.
> 
> Entropy means something very special.
Well, yes, and in its technical definitions - either in thermodynamics where it arose or information theory where it was imported because of a similarity in formalisms - it plays virtually no role in cryptographic discussion.  In cryptography, especially when discussing random number generators, the word has a loosy-goosy meaning that's somehow tied to lack of predictability - but in an unspecified way.  When people make it a regular habit to "guess" the entropy of various processes and then go on to build systems based on guesses, you know the word has lost any formal meaning it ever had.

While the distinctions you draw are valid and important, I'm afraid "entropy" no longer has the capability to distinguish them.

> For a great many cryptological purposes, a high-quality 
> PSEUDO-random distribution is good enough, even though its 
> entropy density is very low.  Note the contrast:
> 
>  TRNG entropy density = 1 - epsilon
>  PRNG entropy density =   epsilon
I don't know what this means.

> As another way of emphasizing the distinction:  a PRNG places 
> orders-of-magnitude harsher demands on the strength of the 
> cryptological primitives it uses.
I don't know what *this* means either.  I drew a distinction in an earlier post - a distinction you can find in many papers on cryptographic primitives - between random values (unpredictable to the attackers being considered) and nonces (never repeated in a given cryptographic context, but there are no assumptions about unpredictability).  Where a random n-bit value is specified, I can think of no paper that does not assume what we call "n bits of entropy" - though a better way to say it is "a value chosen uniformly at random from the set of n-bit strings".  Sometimes the value is supposed to be chosen uniformly at random from some other set - e.g., from Z/p, i.e., between 0 and p-1.  Trying to state this in terms of entropy is a losing game - and in fact it isn't actually trivial, given a random *bit* source, to produce such a distribution.  People have gotten it wrong in the past.  (The obvious technique of choosing a k with 2^k > p, choosing a "random" k-bit value, and the
 n reducing it mod p, if described in terms of "entropy", looks fine - I have k bits of entropy, which is more than enough to cover the choice I need to make.  But the output is biased.)

> This can be quantified 
> in terms of classical cryptologic ideas such as unicity 
> distance, but for present purposes I prefer the "entropy 
> density" language.
"Unicity distance" is a nice but different concept - and one with little bearing on cryptography today.  (If I draw a cipher E at random from some a collection of functions - e.g., by choosing a key - then the unicity distance is the unicity distance is just the number of pairs (x, E(x)) that specify E uniquely within the set.  I suppose you can stretch this to talk about how many samples from the "random" generator are needed to specify *it* uniquely - i.e., be able to determine all past and future states - but I've seen the term used that way.)  A broader - hence more useful - classical term (which may have been introduced by Shannon) is "equivocation".  I don't recall the formal definition, but it attempts to capture the uncertainty an attacker has about the next plaintext, given all the information he already has.  If I know a message was encrypted using a one-time pad, my uncertainty about the first character is not absolute - it's much more likely to be "t" than "z".  T
 o provide information-theoretic security, a one-time-pad must contain "enough randomness" to keep my a priori and a postiori equivocation equal.  This *is* something that can be given in terms of the entropies of the message and one-time-pad sources, but the deracinated notion of "entropy" one sees in most discussions is way too weak to say anything useful here.)

> The rubber meets the road here:  Consider the contrast:
> 
> PRNG:  I am quite sure that on startup the machine needs to 
>  have on board a crypographically strong, well-seeded PRNG.
>  This needs to be up and running very, very early in the 
>  boot-up process.  Some things that need the PRNG cannot
>  wait.
> 
> TRNG:  At the moment I have no firm opinions as to how much 
>  actual entropy the machine needs on start-up.  I look 
>  forward to having a discussion on this topic, with use-case 
>  scenarios et cetera.
A BBS generator is "indistinguishable" from a true random number generator.  What's missing from that statement - and from the distinction you're drawing above - is a specification of the attack model.

The BBS generator has inherent limitations that are given in its proof of correctness:  The attacker has polynomially bounded resources (in fact, "attacker" literally means a polynomially-bounded probabilistic TM), which in turn implies that it only has access to a polynomially-bounded number of outputs.  These are generally fine.  The proof doesn't bother to state (though it's implicit) the obvious:  That the attacker doesn't have access to the internal state of the generator.  This isn't an "entropy" assumption - the internal state must, to the attacker, appear to have been chosen uniformly at random from all possible internal states, and is not available to the attacker.  If you want to use "entropy-talk", all the entropy in a BBS generator is there, at the start, in the choice of the initial state.  And yet there's a profound difference between the output of a BBS generator and the output of, say, a linear congruential PRNG starting with exactly the same state.  One is pr
 edictable given a few successive samples; the other is secure given any polynomially-bounded number of them.  "Entropy" simply cannot capture this distinction.  And it's in fact exactly the distinction - in the transition from Shannon's classic notions of information-based security to modern computability-based ones - that make it possible to say anything useful at all about encryption algorithms other than one-time-pads.

While we have no proofs like those for BBS for PRNG's built on practical cryptographic primitives, we generally assume that they have similar properties.  (More correctly, we can prove some properties given assumptions about others.  But those are the same assumptions we make about the actual encryption and hashing and signature algorithms we use.  If they fail, the whole system falls down regardless of the source of "random" values.)

To summarize:  The distinction between cryptographic PRNG's and "true" RNG's has to do with the attack model.  The attacker considered in the former is polynomially bounded (OK) and doesn't receive as input some special piece that we label "internal state" (this one can be violated).  The attacker against a "true" RNG is ... what?  Formalizing that is equivalent to formalizing randomness - which no one has managed to do.  (In fact, modern developments of probability theory don't even try - random distributions are simply among the axioms.  Even more, if you believe John Conway, the "Free Will" Theorem he and he and Simon Kochen proved show that "quantum unpredictability" is *stronger* than randomness!)  In practical cryptography, this translates into "whatever I can imagine".  In my other thread on plausible attacks, I tried to focus on limiting the attacker to, well, plausible operations in a particular real-world setting.  If you *don't* do that, you can make no progress at
  all.  A hardware generator - even Turbid - is vulnerable if I include as plausible an attacker who's infiltrated every supplier of electronic components in the world and has slipped a small transmitter into everything built, including every diode, resistor - hell, maybe every piece of wire!

I have no problem with designing strong, practical random number generators.  I'd love to see them deployed more widely.  But clever designs of low-level primitives, as important as they are, are not a substitute for *system* designs; in fact, they can blind us to the necessary system properties.  If a cryptographic primitive needs some unpredictable input, we need to go further and ask (a) how much? (b) unpredictable under what attack model?  Only then can we begin to answer the *system* design question of "what's a suitable generator for such inputs?"  If we can, for a reasonable cost (measured in any appropriate way), get our hands on a generator whose attack model assumes way more attacker resources than attacks on multiple uses of those values - great, let's make use of it.

Too much discussion of "random number generators" is the equivalent of "I'm not sure AES is strong enough, so I'll do a ROT-13 encoding first - it can't hurt".  And it can't - until you run into Tony Hoare's comment to the effect that "You can make a system so simple that you can see at a glance that it's correct, or so complex that you can't see at a glance that it's *not* correct."
                                                        -- Jerry

_______________________________________________
The cryptography mailing list
cryptography at metzdowd.com
http://www.metzdowd.com/mailman/listinfo/cryptography