Re: Anonglish (was: Re: Authenticating Meat)

30 Apr 2003

      John Kelsey wrote:
...
At 03:42 PM 4/28/03 +0100, Peter Fairbrother wrote:
...
If you have perfect compression, and you encrypt a message which has been
compressed, any decryption will look sensible.
You do understand that building this kind of compressor implies passing the
Turing test, right?  For the messages to be sensible, they have to have
some underlying meaning that makes sense.  This isn't just compression in
the sense of fast implementations of statistical models of text....
I do realise that. More, it has to be able to fake the sender, not just a
random human.

I'm not trying to build that sort of compressor tho' - but see my ps.

The compressor I'm beginning to build now does not have to pass a Turing
test directly. It can only compress a limited subset of possible messages,
and if that subset is small it's easy to see that it can be done.

Say your possible messages are:

Attack at dawn
Attack at dusk
Retreat at dawn
Retreat at dusk

Assign a number to the verb, and a number to the "time" (not being a
grammarian I don't know offhand what that part of speech is called). In this
limited case that's just two bits, so eg "Attack at dawn" compresses as
0x00.

Now encrypt that 0x00, using eg an XOR with a key of 10, to give ciphertext
0x10. Decrypting that with key 00 gives message 0x10 - which decompresses to
"Attack at dusk", a plausible decryption*.

There are further considerations when variable sentence structures,
multi-sentence messages (and lots more things) are considered, of course.
For instance, longer messages have to be self-consistent, which can be done
using closeness arrays and best-fit techniques. And doing it on a wider
scale is harder, and a whole lot of work...

* However, if "Attack at dusk" is an unlikely message because of real-world
events, eg you have already won the battle, then the decryption loses some
plausibility...

There are several ways around that. First is to have a "godlike" compressor
which knows everything in the real world, or at least everything any sender
is likely to send, so that _all_ possible decryptions are real-world
plausible, but that's not within my ability to write. It's impossible anyway
(unless you're God).

Second is to just accept that only a portion of possible decryptions will be
real world plausible (most, if not all, should be language-plausible and
self-consistent-plausible). It shouldn't be hard to get a small proportion
to be rwp.

This is still very useful, as an attacker can't distinguish between a
brute-forced set of real-world plausible decryptions (a subset of all
possible decryptions, which should be large enough to contain many examples
of contradictory decryptions), and a purportedly real decryption can be
challenged by producing a different real-world plausible decryption, or
preferably ten thousand of them.

Having a fake key that decrypts to a rwp decryption can be done, if the fake
key is prearranged before the message is sent. Useful when lots of messages
are encrypted with the same key. If you can check the decryption first, you
can also afterwards select a key that will give a rwpd.

Third is to try and get almost all decryptions to be rwp, using complex
techniques (!) and the fact that the set of messages that can be sent is
limited. For instance, if it was limited as in my example above and you
wanted to tell someone that you couldn't go on a promised date this evening,
you would send "retreat at dusk". This is a very contrived example, of
course.

Unfortunately you still can't give a randomly-chosen-afterwards key which
will _always_ give a rwpd, which would be _very_ nice to do.

I'm investigating a few possible ways to do that, perhaps just to do it
effectively without 100% of possible decryptions being rwp, but I haven't
gotten any results worth repeating yet.

And yes, I do know the theory that says it's impossible. Change the
conditions a little and the theory might not be applicable any more.

-- 
Peter Fairbrother

ps  I did some experiments a couple of years ago and got (some) rwp
decryptions in most 60-word messages, and in some 200-word messages. The
parser used was surprisingly important. Only tried at most a few hundred
trial decryption/decompressions per message, but I didn't get anyone else to
check the rwp, so the results may have been a bit subjective.

That was not super-perfect tho', just an attempt to approach perfect.