Tagging data to detect thieves

Thu Mar 11 23:56:36 PST 1993

Mark Ringuette asks about schemes to detect which copies of some 
proprietary information were used to resell the data.

I recall reading a paper on this in the proceedings of one of the crypto 
conferences within the past several years.  Unfortunately, I don't have 
a more accurate reference handy.  The authors referred to this problem 
as "digital fingerprinting" (i.e. adding a "fingerprint" to each copy of 
a document).

As I recall, the idea was to twiddle bits in such a way that any subset 
of copies up to a specified size would have a certain number of 
identically twiddled bits.  The thiefs who cross-correlate 64 (or 
however many) copies will not know about the bit twiddles which were 
common to all 64 copies.  Their output will still contain those common 
bit-twiddles, and this information allows the thiefs to be caught.  The 
paper shows a formula for the number of possible bit-twiddle-places and 
the number of bit-twiddles per copy needed, as a function of how many 
copies you are defending against the bad guys getting.  It was basically 
just a combinatorial/counting argument.

I do seem to recall that if the bad guys could get a lot of copies the 
number of bits needed grew exponentially.  I don't know whether 
defeating an attack with 64 copies was practical using this scheme.

Mark also asked about secret sharing.  The classic secret sharing paper 
is "How to Share a Secret"; I think it was by Shamir, in an old CACM 
from the 70's.  As I recall, he proposed encoding the data as a K-1 
degree polynomial in some modulus field.  Give each person a point on 
the polynomial.  K points are required to recover the polynomial.  I 
don't recall how the encoding of the data as a polynomial was to be 
done, but the author showed that K-1 points gives you no information 
about it.

Hal
74076.1041 at compuserve.com