
The monarchist lightning-hater has been cracking the whip again.
Where is highly sophisticated stego?
Adam Back suggested several areas for investigation, including English text. Some weeks ago somebody suggested using synonyms in natural language to express stegodata. A synonym for a word in the message would be chosen from a defined set, and the choice that was made would express a bit or several. This seems related to the clever "alternate dictionary" idea in RFC-1938. But a lack of true synonyms (or programs that understand natural language) make it look pretty hopeless as a practical proposition. SOURCE CODE A POSSIBLE CHANNEL Even small programs often have several hundred symbols. The names given to these are mostly arbitrary, and they can be replaced (consistently) by other names by a program. In fact there "shrouded" source code is sometimes released to make it less clear to the human reader than the original source. In this application the names would be chosen randomly from a set of words and syllables (perhaps with some rules about which are plausible together). There is also a choice of StyleWithCaps and style_with_underscores. If the symbols (in order of first appearance) all have a CRC calculated on them I'd guess you could get 4 bits each without difficulty. There are also comments. These might be natural language, debugging code and never-really-got-this-working code. Maybe a CRC on each comment is worth 8 bits. Comments of the form /* Thanks to P.McNicholas, M.Davies and WorpleSword for advising on the next section **/ could be added with random names quite well (although the user might have to mark the complicated code fragments suitable for receiving such attributions). Names used could be lifted from a list of known helpful programmers ... And then there is layout. The kind of thing that is standardised with indent(1). This would be anti-standard in layout, choosing spacing and linebreaks and spaces with the stegomessage in mind. And some code constructions are capable of rewriting to add info to the message. 0==x , x==0 x>y , x<y i++ , ++i (sometimes) for , while if , switch (sometimes) Or variables could be added to hold intermediate values, and lengthen the source as our desired effect. There may be more scope for plausible stego in code that doesn't have to work. Next time you see a load of verbose mistyped rubbish called "Help me with my program!!!" it could be someone who got this idea ahead of me. I've been dipping into "Understanding and Writing Compilers" (Bornat, 1979). (free, withdrawn from library) It seems the early phases of a compiler build the symbol table as a tree and do various bits of analysis on the source. A good starting point for a program might be a compiler - rip out the translation and optimisation phases and add the stegoreader and stegogenerator. KEYED V. UNKEYED STEGO Assuming that the data hidden in stegofiles is random (keys or cyphertext) the enemy is supposed to be unable to tell whether there is really a message there. An example of unkeyed stego is the use of LSB of each byte in an image or audio file. In contrast, the subliminal channel mechanisms written about by Schneier use a key known to the intended reader. In this application keyed stego would seem to have an advantage over unkeyed in that the source could be made to keep to a house style while retaining some stego capacity. The seeds for the CRCs and style info (resembling ./indent.pro ~/indent.pro) could be user-defined. I'd still wouldn't want to hide plaintext under it, but it would give more flexibilty for the users. Would anyone like to write a version ? Then we could test my guesses about how many bits can be hidden. It probably won't be close to 1 in 8, but I think 1 in 30 should be within reach. -- Peter Allan peter.allan@aeat.co.uk