Re: why compression doesn't perfectly even out entropy
At 10:07 AM 4/16/96 -0400, Perry E. Metzger wrote:
...Usually, random sequences are non-compressable, but it is possible (though very improbable) for Hamlet to appear out of a random number generator, and it is of course quite compressable...
But even if it came from a completely random source, it would still make a bad one-time pad. When people say "compressable" or "algorithmic complexity" or "random", a context is always implied. In the context of "fair coin flips" the text of Hamlet is NOT compressible. Because no string is more likely than any other. Any algorithm that could compress that string, will, on the average INCREASE the length of "fair coin flip" strings it tries to compress. Under the context of "pads that might be used for cryptographic purposes" the text of Hamlet is quite compressible. An attacker is much more likely to test for such a stream than one that appears more random. So, even if you got "Hamlet" from a perfectly random source, you should reject it for crypto purposes. There is an exception to this rule. If you are so revered as a cryptographer that no analyst would believe that you would deliberately choose a non-random pad, then it would be safe to use Hamlet if it appeared in a random source. It is amazing that one's reputation can affect the randomness of the bitstrings one uses. PS: I have written a compressor that can compress ANY string to the single byte "X". There are 2**n different decompressor programs, where n is the bit size of the original file. All you have to do is specify the number of the correct decompressor program, and you have the original file. Note that no computer is required for either the compressor or the decompressor. (patent pending)
rick hoselton writes:
At 10:07 AM 4/16/96 -0400, Perry E. Metzger wrote:
...Usually, random sequences are non-compressable, but it is possible (though very improbable) for Hamlet to appear out of a random number generator, and it is of course quite compressable...
But even if it came from a completely random source, it would still make a bad one-time pad.
No it wouldn't. There is a tiny but nonzero probability that xoring your one time pad with your text will result in a cyphertext equal to, say, the Bible. Big deal. If the key is really random, the cryptanalyst has no way to tell what the underlying text was.
In the context of "fair coin flips" the text of Hamlet is NOT compressible.
Huh? There is only one context in which things are compressable or not -- is there a smaller representation for them.
Because no string is more likely than any other. Any algorithm that could compress that string, will, on the average INCREASE the length of "fair coin flip" strings it tries to compress.
True enough, but the claim was that a random string has no representation which is smaller than itself. .pm
"Perry E. Metzger" writes:
In the context of "fair coin flips" the text of Hamlet is NOT compressible.
Huh?
There is only one context in which things are compressable or not -- is there a smaller representation for them.
Then I propose the following compression algorithm to compress your "random" one-time pad of 2 million bits with value k. The algorithm will decompress the input bit "1" to k, and decompress the input bit "0" to the bit-string "10101010". Therefore your "random" pad is compressible to exactly one bit, and is not "random" as you supposed. "Smaller representation" indeed. The decompression *algorithm* must be accounted for in the "representation" of the compressed text, otherwise an arbitrary amount of information may be stored in the algorithm itself. Hamming codes offer a way to compress any bit stream. They move whatever patterns they can find in independent 8-bit segments into the coding alphabet, and replace them with shorter strings. If you don't save the alphabet, you can't decompress the stream, and have lost information that was originally in the stream. If an OTP generator accidentally chooses "Hamlet", big deal. As long as your opponent believes that you have a good OTP generator he has no reason to try "Hamlet" before any other pad, so Hamlet's compressibility as english text is irrelevant.
-----BEGIN PGP SIGNED MESSAGE----- In article <199604161843.LAA19508@toad.com>, rick hoselton <hoz@univel.telescan.com> wrote:
At 10:07 AM 4/16/96 -0400, Perry E. Metzger wrote:
...Usually, random sequences are non-compressable, but it is possible (though very improbable) for Hamlet to appear out of a random number generator, and it is of course quite compressable...
But even if it came from a completely random source, it would still make a bad one-time pad. When people say "compressable" or "algorithmic complexity" or "random", a context is always implied.
In the context of "fair coin flips" the text of Hamlet is NOT compressible. Because no string is more likely than any other. Any algorithm that could compress that string, will, on the average INCREASE the length of "fair coin flip" strings it tries to compress.
Under the context of "pads that might be used for cryptographic purposes" the text of Hamlet is quite compressible. An attacker is much more likely to test for such a stream than one that appears more random. So, even if you got "Hamlet" from a perfectly random source, you should reject it for crypto purposes.
This thread is becoming isomorphic to one that took place on the Coderpunks list. Jonathan Wienke was promoting an idea to make the output of a PRNG "more" random by throwing away output whose statistics didn't match the ideal statistics of an ideal RNG. Critics of this scheme (including Perry) argued along these lines: Suppose you think that quotes from Hamlet don't belong in your OTP keystream, and so you filter them out. In doing so, you are making your keystream *less* random, not more, because you are making some bit sequences more likely than others. Given that Hamlet quotes aren't very likely, you aren't making your keystream very much weaker, but you *are* weakening it. See the Coderpunks archives for more details on this argument. - -- Alan Bostick | They say in online country there is no middle way mailto:abostick@netcom.com | You'll either be a Usenet man or a thug for the CDA news:alt.grelb | Simon Spero (after Tom Glazer) http://www.alumni.caltech.edu/~abostick -----BEGIN PGP SIGNATURE----- Version: 2.6.2 iQB1AwUBMXUQuuVevBgtmhnpAQHZpgMApBbI3CPieZc/V/vQt5vAqHX/XcRqWjg3 Rilta9XizlIfq7BYS4NKefov7t2kAW+cgsWESC17rJ7gkXCYIsdvaGg4q1uunDG+ 0MXhL406zQbcsPy3iUROGHFIz+IRvkNY =qjiR -----END PGP SIGNATURE-----
participants (4)
-
abostick@netcom.com -
Perry E. Metzger -
rick hoselton -
Scott Brickner