even more steganography talk
Another way to describe a successful steganography system... I am the opponent. I possess a collection of files that might contain hidden encrypted messages. My task is to determine if they do contain hidden encrypted message. A casual inspection of the files does not reveal any bit patterns that deviate significantly from patterns found is most examples of these kinds of files. However, I suspect these files contain hidden messages that were deposited using a steganography algorithm initialized from a public-key generated initialization vector. To test my hypothesis, I will reverse the steganography process using a large collection of public-keys and then examine the resulting bit sequences. -------- If the steganography algorithm is a good one, reversing the steg process will produce a sequence of bits that appears relatively random, even if there is *no* hidden message. What does "appears relatively random" really mean? How do you measure the randomness of a sequence of bits? I'm not an expert in this field, but I would guess you could measure the randomness by attempting to compress the bit sequence. If the bit sequence does not compress much, it is relatively random. How much is "not much"? In other words, what threshold compression percentage value should you use to declare one bit sequence random and another not random? I don't know. To generalize, an opponent will perform some kind of test to determine if the result of reversing the steg process produces a random bit sequence or a non-random bit sequence. The test will have some threshold value below which indicates a random sequence. If the output of the reverse steganography step always falls below the threshold, even if there is no hidden message, then the opponent will not be able to determine if a file contains a hidden message. Jim_Miller@suite.com
What does "appears relatively random" really mean? How do you measure the randomness of a sequence of bits?
Randomness is the wrong measure. Suppose I take 2^10 random bits and prepend 16 zeros. How random is this? Almost as random, and this can be made precise. How compressible is it? Almost incompressible. Now, what about 2^20 bit? 2^30? It is not randomness but recognizability which is at issue. Then the next issue arises.
If the reverse steg process makes it look like all, or even many, files contain hidden messages, even when they don't, then you can plausible deny knowledge of a suspicious bit pattern in any specific file.
The situation of one file is the wrong problem. Suppose you have a collection of files. What you want is deniability for the group of files as a whole. This is much trickier, and the obvious thing doesn't work. Suppose the files contain some bytes of an RSA encrypted session key concatenated to the bytes of a file encrypted with the session key. This is a reasonable scheme, and is basically how a stealth-PGP might work. Because the mode of representation is concatenation, the session key is represented as some arbitrary number X mod N, the public key modulus. Recall that N is public. Now let k be the length of N in bits, rounded up to the nearest multiple of eight. Since the encrypted key is represented as bytes, the bit length is a multiple of eight. Now the probability that a random number between 0 and 2^k will be less than N is N/2^k. Easy. If N is not chosen specifically with this purpose, the fraction N/2^k is on average about 1/4. The important thing is not that this number is small but that it is less than one, say p. Now take an arbitrary string of bits and apply the (public) extraction technique for a given public key, and from this extract a candidate for the encrypted session key. Now you can check the candidate against the modulus. If the candidate is greater than the modulus, then you can reject that public key as being a possible recipient of that message. The probability that a public key rejects none of a group of files grows exponentially small, therefore. Each time a file is not rejected as a possible message with respect to a particular recipient key, the probability lowers by p. You could even check all possible keys. You may not be able to identify the recipient, but in aggregate the opponent will be able to ascertain that messages are being sent. That is sufficient. Steganography not only seeks to hide individual messages, but also the fact that communication is taking place. There are some defenses. One can look for public keys which give high N/2^k ratios. Unfortunately, this almost assuredly makes factoring the modulus easier, if only by lowering the search space. One can make sure the collection of files contains some ringers, such that the ratio of ringers to real messages is 2^k-N:N. This is certainly possible if one is simply storing files, but if the collection of files were intercepted in transit, the sender would have to make sure to send files in the correct ratio. Yet this requires that the sender look out for you and your security! What is most broken here is the N/2^k ratio itself, that is, the artifact of the byte-oriented encoding. In other words, a random modular number is not random in the byte length representation. More to the point, one can't simply lop the front off a PGP message and get stealth-PGP. So one way to solve this is to introduce some indeterminism into the modular representation, so that the session key is evenly distributed in all of its relevant representations. This would mean that every session on the range [0..2^k) was valid, and was taken mod N to decrypt a session key. This yields non-random session keys mod N, which might be acceptable, since the entropy of the modular distribution doesn't drop all that much. Still, this requires the sender's software to be secure. Another way would be to use arithmetic coding to spread out the N/2^k ration throughout the whole file. For an exact solution, one would have to use rational cooefficients rather than 2-adic coefficients, but an approximate solution should be adequate. One needs for the approximate case, however, an estimate of the candidate acceptance rate p above to make sure that the approximation is good enough. This solution doesn't require the sender's software to be any more secure than is in the sender's interest. In steganography, like cryptography, the different layers of abstraction forcibly interfere with each other. The pun here was that an RSA key (represented by a modular integer) was being put into a different representation where it didn't work. These kinds of level-shifting behavior are all-too-common, and are the cause of much protocol failure. Eric
participants (2)
-
hughes@ah.com -
jim@bilbo.suite.com