Re: entropy Eric Hollander writes:
I seem to remember that English text is about 1.5 bits per character. I can find a reference if you're interested.
There are lots of entropies available to measure. There is "true" entropy, the lower bound for all other entropy measures. This is the compressibility limit. The entropy I was referring to was simply the single character entropy. That is, the probabilities p_i in the entropy expression are the probabilities that a given single character appear in the text. This will be higher than the true entropy. Shannon's estimate for H_1 was 4.03 bits/character. This assumes a 27 character alphabet. The entropy for ASCII-represented English will be higher because of punctuation and capitals. The true entropy of English is much lower than this, of course. But for an simple measure to automatically distinguish between plaintext and ciphertext, it should suffice. Re: uuencoding. In my analysis before I assume that the uuencoding would be of random data. If it is not from random data, then the entropy will be lower. Thanks for the clarification. Eric