Entropy, WNSTORM and steganography
rarachel@prism.poly.edu (Arsen Ray Arachelian):
In a previous post you mentioned that PGP does high entropy... Do you have any C source code that finds the entropy of a chunk of data? (I've written a cypher program that hides the cyphertext in a stream of random numbers.)
Entropy is: sigma(- q_i * log q_i), for all i where q_i is the frequency of token i occurring in the data stream. I don't know where I've put my old entropy program, but I cooked one up now, attached to the end of the mail.
Anyway, I'd like to put in an entropy checker into the program. You may have seen me post a notice for it. It's called WNSTORM. I sent it to soda, I don't
I don't get it. OK, maybe if you see "Entropy 1.0" you may feel more secure that the white noise is white noise, but I'm sure you're using some decent generator anyway. As far as using entropy to attempt to make the input (noise) and output (with embedded data) statistically similar goes, it's hardly enough. Entropy measure is not the most sophisticated of analysis techniques! If the real use of WNSTORM is to modify it for stego, to put things into the low bits, then entropy is *definitely* not a great method of ensuring that your stegoed image will be statistically similar to the original. There have been earlier discussions on methods of ensuring that the percentage of 0s and 1s remains similar before and after stegging (I just love that verb; I steg, you steg, he stegs, thou steggeth ;-) I personally believe, based on my not inconsiderable experience working with images both from the image-processing-programming and the digital-effect-touchup points of view, that very minor changes in images tend to be noticable to the human eye, after the right preprocessing. 'Ultimate' steganography may have to bother about very sophisticated statistical modelling, or neural networks (I know that many number theorists, and Bruce Schneier, intensely dislike the latter. They are quite useful, however, in building complex models on data with which one may have no idea what to do). I'm waiting for a large collection of 'before and after' stego images, to play with them and see what I find. (I once worked on a model to recognize faces, fast, by generating a pixel-density graph of monochrome edge-outlined images. Though the project died before the computer properly recognized a face, I could identify faces from their 'densitographs'.) -----
know if it's up there yet. I haven't checked in a while. Anyhow unfortunatly since you're in India I can't send you a copy. I wish I could, but I don't want the damned ITAR cops on my ass. (Now if you were to obtain an account in the USA, or one that looks like a USA address, you could get it yourself without my intervention or knowledge... for all I know you probably have it already :-)
Probably... ;-) ----------------------------------- // this ought to work ;-) double entropy(FILE *fp) { double count[256]; // frequency of chars int c, i; double entr= 0; for (i=256; i--; count[i]=0); while((c=fgetc(fp)) != -1) { // for every char, count[c]++; // inc its count length++; // and the length } for (i=256; i--; count[i]/= length); // convert counts to frequencies 0..1 // sigma(0..255, -q_i * log_2(q_i)), -q_i bcoz log of fraction will be // negative, we'd like our entropy between 0..1, not 0..-1 for (i=256; i--; entropy+= -count[i] * log_base_2(count[i])); return entr; // bits_of_info per BYTE, as we counted 256 values. } ------------------------------------------- ------------------------------------------------------------------------------- Rishab Aiyer Ghosh "What is civilisation rishab@dxm.ernet.in but a ribonucleic Voicemail +91 11 3760335; Vox/Fax/Data 6853410 hangover?" H-34C Saket New Delhi 110017 INDIA -------------------------------------------------------------------------------
Thanks for the algorithm... (I didn't find such a beast in my statistics books, so, I'll use yours as I mentioned earlier...) Actually when I came up with WNSTORM, I knew nothing about cyphers or crypts, and had no idea about what PK systems were... I was a clueless crypto-virgin... But somehow the idea snuck into my head that I could emulate frequency hopping transmissions with a computer, and do it far better than in the physical world. Again, by now, you know how WNSTORM works, so for the others on this list I'll recap.... Basically WNSTORM takes in a byte of plaintext, splits it into its idividual bits and scatters these bits into a random number window of variable size. The random window can be anywhere from 2 bytes to the limit set by the user. (WNSTORM.C handles a limit of upto 31 bytes per rnd window, although chaning a single #define would get around this.) Two arrays are used for this purpose: DataBit[i] and DataByte[i]. DataBit array contains bit values (ie: 1,2,4,8,16...128.) These can be moved around. ie: if DataBit[2]=128, this means that in the current window, what was 2^2 or bit value 4 in the plaintext is now bit 7 (or bit value 128) in the cyphertext. However, you also need to look at the DataByte[2] array to see which byte this actual bit lives in. If dataByte[2]==7 then our bit is in (stream[2] & 128). For each plaintext character a window/stream of random numbers is generated. The size of this channel is determined by a maxchnl variable. This value is mod'ed with limitchnl which the user sets. This is to prevent out of bounds errors. The DataBit[] array elements are either swapped, rotated, interlaced, or otherwise shuffled. The DataByte array elements are chosen on each pass based on random values and the passkey. All these actions are based on some formulas which take in the passphrase and the previous random number window. Obviously making a single change in the cyphertext will cause the total loss of transmission for the rest of the file... Now, I did insert a somewhat "smart" statistical bit-fix routine that would correct changes made by the insertion of the cyphertext bits into the random number window. Since any bit can be 1 or 0, there's a 50% chance that a bit targeted for replacement by a cyphertext bit will change. The odds of a whole byte not changing are very slim of course (1/2)^8, however the bitfix function will for all eight cyphertext bits will try to see if the target bit was changed. If it was it will try to find a byte with the opposite value in another byte. (ie: if we clear bit 128 in byte four, the bitfix function may set bit 128 in byte two.) If the bitfix fails to find a corresponding free bit in the stream, it will set another free bit of whatever value it can find. The bitfix function targets its "victim" bits (ie: those bits in the random number window which were not replaced by the cyphertext bits) randomly so that there won't be much of a chance of detecting the changes made by the bitfix function... The bitfix function is only used durring encryption. It makes no difference for decryption since the algorithm uses the past window of data for the next commands, so any changes made in the current window won't have any ill effects. Now, for the purposes of random numbers, the Borland C 3.1's random number generator is kinda shitty, so I've put in an option to allow WNSTORM to read random numbers from a device or file. This would allow an external hardware device (or device driver) to be hooked into WNSTORM. This also allows WNSTORM to be used for steganography. In a Stego mode, two more programs are needed to interface with WNSTORM. They are extractors and injectors. These are format dependant. They may either extract the low bytes of an image, sound, or other media, or if enough data is available to hide the cyphertext, they may extract the low bit(s) of each byte in the media... The injector does the opposite of the extractor. While the extractor removes data from the media, the injector will take the cyphertext output of WNSTORM and inject it back into the media in the same place where the extraactor removed it. As an aside, the bitfix function does not use the random device for picking its victim bits. The reason for this is that if it did, it would "eat" up data from a possible stego lsb file which would cause major problems in injecting the output back in. Originally I didn't intend for WNSTORM to be used for stego, however, not using it for stego has a big disadvantage (or two.) Primarily, it produces cyphertext that's about 0.5*limitchnl in size. (ie: many times the size of the plaintext you wish to send.) However, using a large window size helps the security of WNSTORM because fewer bits in the stego file are modified, so there's less of a chance of detecting the presence of stego... Another problem with not using it for stego is that you should have a random number generator in hardware with a device driver to talk to it. This is because whatever compiler you use will have a poor random number generator, whose idiosyncrasies could be sniffed out and compared to the cyphertext produced by WNSTORM, so it might be possible to sniff out which bits of the stream are used. However, these weaknesses aside, I'd like some suggestions for a way of attacking this algorithm to sniff out more weaknesses. How would one go about performing cryptanalysis on a cypher which uses random garbage to hide and to encrypt? Certainly chosen plaintext attacks will always fail because encrypting the same text with the same password 100 times will produce 100 different cyphertexts... (Perhaps a good use for this is in cypherpunk anon-encrypted remailers???) The one attack I devised in WNSTORM's eariler incarnation is now plugged up (in the previous version I split the plaintext into two halves and hid the nibbles in the random noise stream. I also didn't use the random numbers in the window which were not replaced by cyphertext. The attack would have been to do statistics on the nibbles, and also to move the whole cyphertext into a RAM drive and interatively change one bit, decrypt the text, see if there's any difference, if there is, the last bit we changed was used. This could give you a map of the used/unused bits. Neither of these attacks will work.) I realize that I'm still an amateur at cyphers and I'm still learning, so my attacks on this program will be limited... So, any of you have any suggestions? (I did notice a lack of interest in this... I posted up announcements for WNSTORM a few weeks ago, and got only two messages from interested cpunks... So anyone interested in helping determine the strength of this cypher?)
participants (2)
-
rarachel@prism.poly.edu -
rishab@dxm.ernet.in