What does "appears relatively random" really mean? How do you measure the randomness of a sequence of bits?
Randomness is the wrong measure. Suppose I take 2^10 random bits and prepend 16 zeros. How random is this? Almost as random, and this can be made precise. How compressible is it? Almost incompressible. Now, what about 2^20 bit? 2^30? It is not randomness but recognizability which is at issue. Then the next issue arises.
If the reverse steg process makes it look like all, or even many, files contain hidden messages, even when they don't, then you can plausible deny knowledge of a suspicious bit pattern in any specific file.
The situation of one file is the wrong problem. Suppose you have a collection of files. What you want is deniability for the group of files as a whole. This is much trickier, and the obvious thing doesn't work. Suppose the files contain some bytes of an RSA encrypted session key concatenated to the bytes of a file encrypted with the session key. This is a reasonable scheme, and is basically how a stealth-PGP might work. Because the mode of representation is concatenation, the session key is represented as some arbitrary number X mod N, the public key modulus. Recall that N is public. Now let k be the length of N in bits, rounded up to the nearest multiple of eight. Since the encrypted key is represented as bytes, the bit length is a multiple of eight. Now the probability that a random number between 0 and 2^k will be less than N is N/2^k. Easy. If N is not chosen specifically with this purpose, the fraction N/2^k is on average about 1/4. The important thing is not that this number is small but that it is less than one, say p. Now take an arbitrary string of bits and apply the (public) extraction technique for a given public key, and from this extract a candidate for the encrypted session key. Now you can check the candidate against the modulus. If the candidate is greater than the modulus, then you can reject that public key as being a possible recipient of that message. The probability that a public key rejects none of a group of files grows exponentially small, therefore. Each time a file is not rejected as a possible message with respect to a particular recipient key, the probability lowers by p. You could even check all possible keys. You may not be able to identify the recipient, but in aggregate the opponent will be able to ascertain that messages are being sent. That is sufficient. Steganography not only seeks to hide individual messages, but also the fact that communication is taking place. There are some defenses. One can look for public keys which give high N/2^k ratios. Unfortunately, this almost assuredly makes factoring the modulus easier, if only by lowering the search space. One can make sure the collection of files contains some ringers, such that the ratio of ringers to real messages is 2^k-N:N. This is certainly possible if one is simply storing files, but if the collection of files were intercepted in transit, the sender would have to make sure to send files in the correct ratio. Yet this requires that the sender look out for you and your security! What is most broken here is the N/2^k ratio itself, that is, the artifact of the byte-oriented encoding. In other words, a random modular number is not random in the byte length representation. More to the point, one can't simply lop the front off a PGP message and get stealth-PGP. So one way to solve this is to introduce some indeterminism into the modular representation, so that the session key is evenly distributed in all of its relevant representations. This would mean that every session on the range [0..2^k) was valid, and was taken mod N to decrypt a session key. This yields non-random session keys mod N, which might be acceptable, since the entropy of the modular distribution doesn't drop all that much. Still, this requires the sender's software to be secure. Another way would be to use arithmetic coding to spread out the N/2^k ration throughout the whole file. For an exact solution, one would have to use rational cooefficients rather than 2-adic coefficients, but an approximate solution should be adequate. One needs for the approximate case, however, an estimate of the candidate acceptance rate p above to make sure that the approximation is good enough. This solution doesn't require the sender's software to be any more secure than is in the sender's interest. In steganography, like cryptography, the different layers of abstraction forcibly interfere with each other. The pun here was that an RSA key (represented by a modular integer) was being put into a different representation where it didn't work. These kinds of level-shifting behavior are all-too-common, and are the cause of much protocol failure. Eric