Steganography and Steganalysis

kqb at whscad1.att.com kqb at whscad1.att.com
Tue May 25 14:49:12 PDT 1993


I have received some useful feedback to yesterday's message on
steganography and steganalysis.  Here are some clarifications
to my cryptic presentation and a correction.

I was most interested in finding if the steganographic capacity of
English is high enough to make steganography practical for everyday
use, so I didn't even address the meaningfulness of the output.
For example, if I could only produce a capacity of a tenth of one percent,
the meaningfulness would not even be an issue because nobody would
want to send large messages via steganography anyway.  A capacity of
10%, requiring the public text to be only 10 times as long as the
hidden text, may be good enough for everyday use.  If that can be
achieved, then the next step is to see if meaningful output can
have a high steganographic content.  If so, then I expect that
several cypherpunks would want to pursue that.  (FYI: I plan to do
more analysis on my own, even if nobody else does.)

My guesstimate for the steganographic capacity of English did not
provide a steganographic algorithm.  For example, I haven't even
looked into how to map a bit string to a parenthesis grouping;
I was just noting that if you have X(N-1) possibilities, there must
be log (X(N-1)) bits available, assuming all possibilities are equally
likely.  Is there a simple-to-compute mapping of the numbers 1 through
X(N-1) to the X(N-1) parenthesizations of an N word sentence?
Fortunately, N rarely gets large for ordinary English sentences,
so a general solution may be unnecessary.

My presentation mistakenly implied that a good steganographic algorithm
may have the form:
    E(K, M) = E2( E1(K,M) )
where E1 is a cryptographically secure encryption function with public
key K and hidden message M, E2 somehow converts the encrypted message to
ordinary English text, and E1, K, and E2 are publicly known.
Unfortunately, if the inverse of E2 (let's call it D2), is easily found,
then the presence of a hidden message can be detected easily, even though
that message cannot be decrypted easily.  This is because the output of E1,
which is incompressible, is easily distinguishable from
D2(ordinary English text).  Here is a better formulation for the
steganographic schema:
    E(K1, K2, M) = E3( E2(K2, E1(K1,M) ))
where:
  E1(K1,M) converts the hidden message M to a cryptographically secure
      cyphertext by using the key K1.  E1 and K1 are public, but the
      decryption function D1 is difficult to compute without the
      private key PK1.
  E3(C) converts a bit string to ordinary-looking English text.
      Assume that both E3 and its inverse D3 are public.
  E2(K2, C) converts the cyphertext C into another bit string such
      that E2(K2, C) has the same statistical characteristics as
      D3(ordinary English text).  Assume that E2 and K2 are public,
      but D2 is difficult to compute without the private key PK2.
Function E1 is normal public key cryptography, which produces an
incompressible cyphertext.  I hope that function E3 has a high enough
steganographic capacity to make steganalysis worthwhile.  Function E2
cannot be a normal encryption function because its output needs to be
as compressible as D3(ordinary English text).  Both functions E2 and E3
are new types of functions that require more research to work well.

I still haven't seen any references to this type of steganography
being done before, but thanks to the various people who gave pointers
to tools that may help in building it.

                              Kevin Q. Brown
                              INTERNET    kqb at whscad1.att.com
                                 or       kevin_q_brown at att.com






More information about the cypherpunks-legacy mailing list