probability question for math-heads
I'm too tired &/or busy to work this out, via Knuth --- maybe you can help, with some implications for the DES keysearch strategy. What is the expected distribution, in a "random" binary sequence -- with all the fuzziness that implies as to what _exactly_ is "random" -- of gaps between runs of same-bits. i.e. what is the expected distribution of sequence length between occurances of two (and only two) 1-bits in a row? how about sequences of 3 1-bits? ETc. We know that in a _truly_ random sequence, taken over a long enough period, there should be all possible values of "gaps". But what is reasonable to expect in a "random" sequence as to how those gaps are distributed? Is my question equivalent to Knuth's gap test? If anyone feels like proffering some education on this, if I find anything useful in my investigations I'll certainly credit the help! TIA, etc. -- and hey: doesn't Nickelodeon have a trademark on GAK?
On Wed, 23 Oct 1996, geeman@best.com wrote:
I'm too tired &/or busy to work this out, via Knuth --- maybe you can help, with some implications for the DES keysearch strategy.
What is the expected distribution, in a "random" binary sequence -- with all the fuzziness that implies as to what _exactly_ is "random" -- of gaps between runs of same-bits.
i.e. what is the expected distribution of sequence length between occurances of two (and only two) 1-bits in a row? how about sequences of 3 1-bits? ETc.
We know that in a _truly_ random sequence, taken over a long enough period, there should be all possible values of "gaps". But what is reasonable to expect in a "random" sequence as to how those gaps are distributed? Is my question equivalent to Knuth's gap test?
If anyone feels like proffering some education on this, if I find anything useful in my investigations I'll certainly credit the help!
The math heads are busy doing math, so I'll answer instead. The probability goes 1:2^1... 1:2^2... 1:2^3... 1:2^4... Two bits hold 4 possible values, 2 of which are bit-alike. Three bits hold 8 possible values, 2 of which are bit-alike, of course. 1:2^(bits-1) is exactly what you would expect from a RANDOM sequence. This was also implied in the followups to your previous thread re: DES keyspace optimization. Remember the Gambler's Fallacy... Regards, Doug
At 11:28 PM +0000 10/23/96, geeman@best.com wrote:
I'm too tired &/or busy to work this out, via Knuth --- maybe you can help, with some implications for the DES keysearch strategy.
What is the expected distribution, in a "random" binary sequence -- with all the fuzziness that implies as to what _exactly_ is "random" -- of gaps between runs of same-bits.
i.e. what is the expected distribution of sequence length between occurances of two (and only two) 1-bits in a row? how about sequences of 3 1-bits? ETc.
We know that in a _truly_ random sequence, taken over a long enough period, there should be all possible values of "gaps". But what is reasonable to expect in a "random" sequence as to how those gaps are distributed? Is my question equivalent to Knuth's gap test?
It obviously depends on your definitions of period, gap, clusters, patterns, etc. These would have to be pinned down before a precise answer could be given. For example, if the sequence is 30 digits long, or 5 digits long, etc. But the general _framework_ for looking at this problem would probably be the Poisson distribution. That is, if the _expected_ ("average") number of gaps, runs, whatever, is "m", then the number "s" of actual gaps/runs that is 0, 1, 2, 3, 4, ...., n, is given by the Poisson distribution: P [s; m] = (exp (-m)) (m ^ s) / (s!) (I read this like this: "The Poisson probability of seeing s occurrences given that m are expected is e to the minus m times m to the s divided by s factorial." I literally remember the formula from this verbal pattern, having used it much in my earlier career.) Thus, if one _expected_ to see m = 3 occurrences of some event, then one would likely see 0 occurences (exp (-3)) (3 ^ 0) / (0!) = 0.0497 = 5% of the time, to see 1 occurrence (exp (-3)) (3 ^ 1) / (1!) = 0.149 = 15% of the time, to see 2 occurences (exp (-3)) (3 ^ 2) / (2!) = 0.224 = 22.4% of the time, and so on, even having a chance of seeing s = 10 occurences given by: (exp (-3)) (3 ^ 10) / (10!) = 0.00081 As expected, the cumulative probability of all the occcurrences is 1.00. (In fact, the Poisson distribution can be derived from some very basic considerations, such that the sum of all outcomes must be 1, and that the past history of occurrences does not effect the next outcome.) In this example, I would put the expected value of the "gap" between the same values (e.g., gap between 1s, gap between 0s) at exactly 0.5, for this random sequence. (This is the weakest part of this post, as it's possible I'm falling into a kind of trap by putting the expected value at one half. If anyone thinks differently, e.g. that it should be "1," we can discuss this. But the framework is otherwise unchanged.) Thus, P [0; 0.5] = (exp (-m)) (m ^ s) / (s!) = 0.6065 P [1; 0.5] = (exp (-m)) (m ^ s) / (s!) = 0.3033 P [2; 0.5] = (exp (-m)) (m ^ s) / (s!) = 0.0758 P [3; 0.5] = (exp (-m)) (m ^ s) / (s!) = 0.0126 P [4; 0.5] = (exp (-m)) (m ^ s) / (s!) = 0.0016 etc. There are other ways of looking at this situation, such as the binomial expansion (e.g., Pascal's triangle), but the results should be the same. (Of course, as they all are interrelated.) (In particular, for various "clumps" and "clusters" of bits in a random distribution, e.g., "how often does the pattern "101111011" occur in random sequences of length 30," one could explicitly calculate the permutations and combinations, exactly as one would calculate hands of poker or bridge, a well-known easy problem in statistics and probability.) I like the Poisson approach as a first cut because it works for any form of a random process where some value is "expected" and one wants to know the distribution of "actuals." Clicks on a Geiger counter in some period, number of persons ahead of one in a supermarket checkout line, number of Prussian soldiers kicked to death each year by their horses, etc. The Poisson also works for finite distributions, e.g., a sequence of 10 digits, whereas Gaussian probabilities are best for continuous/unlimited distributions. (In the limit, of course, they are the same.) I hope this helps. (BTW, the Poisson distribution is what I used several months ago to show that the probability of finding a key with a random search (no coordination between various searchers) is sufficiently high. If the _expected_ value of finding the key is say "1" (effort the same as coordinated search), then the probability of _not_ finding the key is given by P [0; 1] = exp (-1) = 36.8%. If twice the effort is put into this, then P [0; 2] = exp (-2) = 13.5%, and so on. So, with several times the effort, the probability the key has not been found is low.) --Tim May "The government announcement is disastrous," said Jim Bidzos,.."We warned IBM that the National Security Agency would try to twist their technology." [NYT, 1996-10-02] We got computers, we're tapping phone lines, I know that that ain't allowed. ---------:---------:---------:---------:---------:---------:---------:---- Timothy C. May | Crypto Anarchy: encryption, digital money, tcmay@got.net 408-728-0152 | anonymous networks, digital pseudonyms, zero W.A.S.T.E.: Corralitos, CA | knowledge, reputations, information markets, Higher Power: 2^1,257,787-1 | black markets, collapse of governments. "National borders aren't even speed bumps on the information superhighway."
Timothy C. May wrote:
It obviously depends on your definitions of period, gap, clusters, patterns, etc. These would have to be pinned down before a precise answer could be given. For example, if the sequence is 30 digits long, or 5 digits long, etc.
yes ... the more one looks at the "random sequence" question, the more questions that arise! Again, referring to Knuth ... anything that takes as many pages in one of his books as "Definition of Random Sequence" does is certainly nontrivial. A "gap" in my current inquiry is the number of bit positions between occurrances of a specific bit-pattern. My current investigation uses same-bit repetitions (00, 11), but of course that generalizes to any specific n-bit pattern (e.g. we should get the same results **in a perfect random sequence** if we count gaps between the pattern "11" or the pattern "01".
But the general _framework_ for looking at this problem would probably be the Poisson distribution. That is, if the _expected_ ("average") number of gaps, runs, whatever, is "m", then the number "s" of actual gaps/runs that is 0, 1, 2, 3, 4, ...., n, is given by the Poisson distribution:
P [s; m] = (exp (-m)) (m ^ s) / (s!)
(I read this like this: "The Poisson probability of seeing s occurrences given that m are expected is e to the minus m times m to the s divided by s factorial." I literally remember the formula from this verbal pattern, having used it much in my earlier career.)
I'm going to have to chew on this .... but thanks for the Poisson hint. Actually, it looks like some graphs I've generated just might have some similarity to Poisson distribution!!! Bears more thought. I certainly appreciate your time in replying in such depth. It'll take a few days to digest. Thanks again, Regards
participants (3)
-
Douglas B. Renner -
geeman@best.com -
Timothy C. May