On Mon, 12 Feb 2001, Trei, Peter wrote:
I realize that this is *slightly* simplistic, but comparing 2 (preferably 3 or more) copies of the data with different watermark contents should quickly reveal where and what constitutes the watermarking.
Not really. If the original version is not available, a properly constructed watermark will basically amount to a noise component. Comparing two noisy versions of the same data will not give you enough statistics to recover the noise completely. Averaging attacks (averaging over multiple independently marked copies to try and fade out the mark; an optimal attack if the marks are independent and flatly distributed) can be seen as a channel distortion, which isn't too difficult to compensate for, with sufficient redundancy in the mark. A good watermarking scheme resists averaging over ten or twenty independently marked copies. Beyond this, it becomes quite difficult to find paying customers ready to participate in collusion on a regular basis. Anyway, this is what CRM people are counting on. Sampo Syreeni <decoy@iki.fi>, aka decoy, student/math/Helsinki university