I recently came across a real-world use of steganography which hides extra data in the LSB of CD audio tracks to allow (according to the vendor) the equivalent of 20-bit samples instead of 16-bit and assorted other features. According to the vendors, "HDCD has been used in the recording of more than 5,000 CD titles, which include more than 250 Billboard Top 200 recordings and more than 175 GRAMMY nominations", so it's already fairly widely deployed.
[...] Hidden Code Addition/Output Dither/Quantization The final step in the reduction to 16 bits is to add high-frequency weighted dither and round the signal to 16-bit precision. The dither increases in amplitude in the frequency range of 16 to 22.05 kHz, leaving the noise floor flat below 16 kHz where the critical bands of hearing associated with tonality occur. As part of the final quantization, a pseudo-random noise hidden code is inserted as needed into the least significant bit (LSB) of the audio data. The hidden code carries the decimation filter selection and Peak Extend and Low Level Range Extend parameters. Inserted only 2?5 percent of the time, the hidden code is completely inaudible-effectively producing full 16-bit undecoded playback resolution. The result is an industry-standard 44.1-kHz, 16-bit recording compatible with all CD replication equipment and consumer CD players. [...] The paper describing the process is available under the somewhat misleading name http://www.hdcd.com/partners/proaudio/AES_Paper.pdf. The description of the stego en/decoding process is on p.15 (it's a rather long excerpt, but it's interesting stuff): As part of the final quantization, a hidden code side channel is inserted into the LSB when it is necessary for the encoder to inform the decoder of any change in the encoding algorithm. It takes the form of a pseudo-random noise encoded bit stream which occupies the least significant bit temporarily, leaving the full 16 bits for the program material most of the time. Normally, the LSB is used for the command function less than five percent of the time, typically only one to two percent for most music. Because the hidden code is present for a small fraction of the time and because it is used as dither for the remaining 15 bits when it is inserted, it is inaudible. This was confirmed experimentally with insertion at several times the normal fraction of time. [...] The mechanism which allows insertion of commands only when needed consists of encapsulating the command word and parameter data in a "packet". A synchronizing pattern is prepended to the data and a checksum is appended. The resulting packet is then scrambled using a feedback shift register with a maximal length sequence and inserted serially, one bit per sample, into the LSB of the audio data. The decoder sends the LSB's of the audio data to a complementary shift register to unscramble the command data. A pattern matching circuit looks for the synchronizing pattern in the output of the descrambler, and when it finds it, it attempts to recover a command. If the command has a legal format and the checksum matches, it is registered as a valid packet for that channel. The arrival of a valid packet for a channel resets a code detect timer for that channel. If both channels have active timers, then code is deemed to be present and the filter select data is considered valid immediately. However, any command data which would effect the level of the signal must match between the two channels in order to take effect. The primary reason for this is to handle the case where an error on one channel destroys the code. In such a case, the decoder will mistrack for a short time until the next command comes along, which is much less audible than a change in gain on only one channel, causing a shift in balance and lateral image movement. If either of the code detect timers times out, then code is deemed not to be present, and all commands are canceled, returning the decode system to its default state. If the conditions on the encoder side are not changing, then command packets are inserted on a regular basis to keep the code detect timers in the decoder active and to update the decoder if one starts playing a selection in the middle of a continuous recording. Since the decoder is constantly scanning the output of the de-scrambler shift register for valid command packets even when none are present, the possibility exists that there may be a false trigger. For audio generated by the encoder, this possibility is eliminated in the absence of storage and transmission errors by having the encoder scan the LSB of the audio data looking for a match. If a match to the synchronizing pattern is found, the encoder inverts one LSB to destroy it. Modern digital storage and transmission media incorporate fairly sophisticated error detection and correction systems. Therefore, we felt that only moderate precautions were necessary in this system. The most likely result of an error in the signal is a missed command, which can result in a temporary mis- tracking of the decoding, as mentioned above. Given the low density of command data, and the small changes to the signal which the process uses, these errors are seldom more audible than the error would be in the absence of the process. The chances of a storage error being falsely interpreted as a command are extremely small. For material not recorded using the encoder, a small probability for a false trigger does exist. Given a moderate length for the scrambling shift register so that its mapping behaves in a noise-like fashion and a choice of synchronizing pattern which avoids patterns likely to appear in audio data with a higher than average probability, susceptibility to false triggers can be made arbitrarily small by increasing the length of the part of the packet requiring a match. In the case of the current system, the combination of the synchronizing pattern with the bit equivalence for all valid commands plus check sum results in a required match equivalent to 39 sequential bits. For a stereo signal, in which a match must occur in both channels within a one second interval and the commands in both channels must specify the same gains, this amounts to an expectation of one event in approximately 150 million years of audio. The scrambling operation uses a feedback shift register designed for a maximal length sequence in which data taken from taps in the register are added using modulo two arithmetic, equivalent to an 'exclusive or' operation, and fed back to the input of the register. For a given register length there are certain configurations of taps which will produce a sequence of one and zero values at the output that does not repeat until 2N-1 values have emerged, where N is the length of the shift register. This corresponds to the number of possible states of the shift register minus one illegal state, and is called a maximal length sequence. Such an output sequence has very noise-like properties and, in fact, is the basis of some noise generators. We use the noise-like behavior of the generator to scramble the command signals by adding them modulo two to the input of the shift register, as for example in Figure 7a. This has the advantage that a second similar shift register with taps in the same places but with only feed forward addition modulo two (Figure 7b) will reproduce the original input sequence when fed with the output of the first one. The fact that the decode side has no feedback means that the initialization requirements are limited to having N input samples prior to the beginning of decoding, which means that the decoder will "lock up" very quickly. In this scheme, the presence of a bit error anywhere in the length of a packet plus initialization sequence will completely scramble the data, preventing recovery. However, in practice, this has not been a problem for reasons described above. Peter. --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomo@wasabisystems.com
--On Tuesday, 01 October, 2002 13:54 +1200 Peter Gutmann <pgut001@cs.auckland.ac.nz> wrote:
I recently came across a real-world use of steganography which hides extra data in the LSB of CD audio tracks to allow (according to the vendor) the equivalent of 20-bit samples instead of 16-bit and assorted other features. According to the vendors, "HDCD has been used in the recording of more than 5,000 CD titles, which include more than 250 Billboard Top 200 recordings and more than 175 GRAMMY nominations", so it's already fairly widely deployed.
maybe. i'm not sure how many players support it (my spectral D/A convertor does, but then some of the people at spectral seem to have invented HDCD). while the CDs i have that use it sound pretty good, i don't have any good way to compare them when played back over a non-HDCD capable convertor (i could hook up one of my computer CD drives, but that doesn't seem fair compared to the spectral transport-D/A combination). but when i do play such CDs on other gear, i don't notice any audible degradation, so it isn't obviously harmful. i've seen comments in reviews of professional CD mastering gear that there are other, seemingly preferred, technologies, although i've never found details of them. -paul --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomo@wasabisystems.com
On Mon, Sep 30, 2002 at 07:31:19PM -0700, Paul Krumviede wrote: | --On Tuesday, 01 October, 2002 13:54 +1200 Peter Gutmann | <pgut001@cs.auckland.ac.nz> wrote: | | >I recently came across a real-world use of steganography which hides extra | >data in the LSB of CD audio tracks to allow (according to the vendor) the | >equivalent of 20-bit samples instead of 16-bit and assorted other | >features. According to the vendors, "HDCD has been used in the recording | >of more than 5,000 CD titles, which include more than 250 Billboard Top | >200 recordings and more than 175 GRAMMY nominations", so it's already | >fairly widely deployed. ... | i've seen comments in reviews of professional CD mastering | gear that there are other, seemingly preferred, technologies, | although i've never found details of them. The two that spring to mind are HDCD and XRCD. I'd never dug into how they're recorded, being much more interested in playing with things closer to the output stage, like speaker resonance control and electrical hum elimination... Adam -- "It is seldom that liberty of any kind is lost all at once." -Hume --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomo@wasabisystems.com
Paul Krumviede wrote: | --On Tuesday, 01 October, 2002 13:54 +1200 Peter Gutmann | | maybe. i'm not sure how many players support it (my spectral D/A | convertor does, but then some of the people at spectral seem to | have invented HDCD). while the CDs i have that use it sound | pretty good, i don't have any good way to compare them when | played back over a non-HDCD capable convertor (i could hook | up one of my computer CD drives, but that doesn't seem fair | compared to the spectral transport-D/A combination). | The extra 4 bits add quite a bit, subjectively. I've compared the same CD on the same system with an HDCD player and non-HDCD player. | but when i do play such CDs on other gear, i don't notice any | audible degradation, so it isn't obviously harmful. | | i've seen comments in reviews of professional CD mastering | gear that there are other, seemingly preferred, technologies, | although i've never found details of them. | The other formats of note are probably SACD and then DVD-Audio. SACD is multichannel 16-bit/44.1kHz... so multichannel CD without additional sample resolution (if I recall). SACD is not "backwards compatible" though, whereas HDCD is. DVD-Audio is really the way to go, though... 24-bit/96kHz multichannel or up to 192kHz two-channel. Lots more bits, lots more samples. It makes a huge difference on "pretty good or better" gear. Regards, Jeremey. -- Jeremey Barrett [jeremey@rot26.com] Key: http://rot26.com/gpg.asc GnuPG fingerprint: 716E C811 C6D9 2B31 685D 008F F715 EB88 52F6 3860 --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomo@wasabisystems.com
Peter Gutmann wrote:
I recently came across a real-world use of steganography which hides extra data in the LSB of CD audio tracks to allow (according to the vendor) the equivalent of 20-bit samples instead of 16-bit and assorted other features. According to the vendors, "HDCD has been used in the recording of more than 5,000 CD titles, which include more than 250 Billboard Top 200 recordings and more than 175 GRAMMY nominations", so it's already fairly widely deployed.
Yeah, right - and green felt-tip around the edges of your CD improves the sound, too. Cheers, Ben. -- http://www.apache-ssl.org/ben.html http://www.thebunker.net/ "There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff
On 2002-10-01, Ben Laurie uttered to Peter Gutmann:
Yeah, right - and green felt-tip around the edges of your CD improves the sound, too.
I'm not sure about HDCD as a technology, but the principle is sound. If we can compress sound transparently, we can also transparently embed quite a lot of data into the part which is perceptually irrelevant. We might also depart with perceptual equivalence and go with perceptual similarity instead -- e.g. multiband compress the audio, and embed data which allows us to expand to a higher perceptual resolution. Whatever the implementation, putting data in the gap between statistical (i.e. computed against a Markov model) and perceptual (against a perceptual similarity model) entropy which compensates for some of the perceptual shortcomings (like total dynamic range) of a particular recording technology seems like an excellent idea. However, applications like these have very little to do with steganography proper. In this case, we can (and want) to fill up the entire gap between statistical and perceptual entropy estimates with useful data, leaving us with signals which have statistical entropies consistently higher than we'd expect of a typical recording with similar perceptual characteristics. That is, the encoded signal will appear manifestly random compared to typical unencoded material from a similar source, and we can easily see there is hidden communication going on. Such encodings will be of little value in the context of industrial strength steganography used for hidden communication. Steganography used in the latter sense will also have to be imperceptible, true, but but here the entropic gap we're filling is the one between the entropy estimates of our best model of the source material vs. that of the adversary's. Be the models Markov ones, perceptual, something else, or composites of the above. Consequently the margin is much thinner (bandwidths are probably at least a decade or two lower), and the aims remain completely separate. Consequently, I don't believe encodings developed for the first purpose could ever be the best ones for the latter, or that HDCD-like endeavors really have that much to do with the subject matter of this list. -- Sampo Syreeni, aka decoy - mailto:decoy@iki.fi, tel:+358-50-5756111 student/math+cs/helsinki university, http://www.iki.fi/~decoy/front openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomo@wasabisystems.com
Peter Gutmann wrote:
I recently came across a real-world use of steganography which hides extra data in the LSB of CD audio tracks to allow (according to the vendor) the equivalent of 20-bit samples instead of 16-bit and assorted other features.
I don't think that's really 'steganography' per se, since no attempt is made to hide the fact that the information is in there. The quasi-stego used is just to prevent bad audio artifacts from happening. -Bram Cohen "Markets can remain irrational longer than you can remain solvent" -- John Maynard Keynes --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomo@wasabisystems.com
At 09:38 PM 09/30/2002 -0700, Bram Cohen wrote:
Peter Gutmann wrote:
I recently came across a real-world use of steganography which hides extra data in the LSB of CD audio tracks to allow (according to the vendor) the equivalent of 20-bit samples instead of 16-bit and assorted other features.
I don't think that's really 'steganography' per se, since no attempt is made to hide the fact that the information is in there. The quasi-stego used is just to prevent bad audio artifacts from happening.
Traditional digital telephone signalling uses a "robbed-bit" method that steals the low-order bit from every sixth voice sample to carry information like whether the line is busy or idle or wants to set up a connection. (That's why you only get 56kbps and not 64kbps in some US formats, since it doesn't want to keep track of which low bits got robbed.) In a sense both of these are steganography, because they're trying to hide the data channel from the audio listener by being low level noise in ways that equipment that isn't looking for it won't notice. That's not really much different from encoding Secret Data in the LSB of uncompressed graphics or audio - it's about the second-crudest form of the stuff, and if you think there are Attackers trying to decide if you're using stego, you need more sophisticated stego - at minimum, encoding the stegotext so it looks like random noise, or encoding the stegotext with statistics resembling the real noise patterns, or whatever. The definition of "hidden writing" doesn't specify how hard you tried to hide it or how hard the Attacker is looking - you need to Bring Your Own Threat Model. Since I don't speak Audiophile Engineering / Human perceptual modelspeak, which the paper was written in, I wasn't able to figure out where the HDCD stuff hides the extra bits. Are they really there (in the CDROM's error-correction bits or something)? It sounded like they were either saying that they make part-time use of the one LSB bit to somehow encode the LSB and 4 more bits, which sounded really unlikely given that there weren't any equations there about the compression models, or else that they had some perceptual model and were using that to make a better choice of LSB than a simple 50% cut-off of the A-to-D converter (more absolute distortion, but better-sounding distortion.) Or did I miss the implications of the reference to oversampling and the real difference is that HDCD disks really have more pixels on the disk with only the LSB different, so a conventional reader reads it fine but needs the ECC to get the LSB? A separate question is - "so is there some internet-accessible list of disks using HDCD, or do I just have to look at the labels for a logo?" --------------------------------------------------------------------- The Cryptography Mailing List Unsubscribe by sending "unsubscribe cryptography" to majordomo@wasabisystems.com
participants (8)
-
Adam Shostack
-
Ben Laurie
-
Bill Stewart
-
Bram Cohen
-
Jeremey Barrett
-
Paul Krumviede
-
pgut001@cs.auckland.ac.nz
-
Sampo Syreeni