Tagging data to detect thieves
I've done some further thinking on the text tagging problem, spurred by a question on sci.crypt about tagging pictures (under the subject line "Permanent signatures for pictures"). Here's a summary. ---- Let's say Dow Jones wants to sell newswire subscriptions to individuals, but someone is anonymously forwarding their articles to a newsgroup. Can they succeed in tagging the text to detect the thief? The idea is to make some small twiddle to each subscriber's copy of the text, so that the stolen copy can be matched with some subscriber and their subscription cancelled. Short answer: the thieves win. At first, I thought the answer was the opposite. ---- There are two issues which must be addressed in order to show that the tagger wins: 1. The taggee must not be able to "smooth away" all of the tag bits. 2. The taggee must not be able to cross-correlate multiple copies of the data in question in order to produce a "clean" version. Regarding issue #1, the basic techique is to alter a few features of your data which are important enough that your opponent can't afford to randomize ALL such bits. In the case of text, small changes in word choice are a good candidate. Two criteria are: A. The changes must be "important" enough that the thief can't smooth them all away. B. The changes shouldn't be "important" enough that the newswire becomes worthless! The tagger has an advantage in this case, though. He can change, say, 1 in 1000 of these "important, non-smoothable-away" candidate bits. If the thief wants to cancel them out and only has a single copy of the picture, he must somehow canonicalize _all_ of the candidate tag bits, or some very large proportion of them. So if your tagging process does a little bit of "damage" to your data, like in the map-maker case of adding an extra small street here and there, then the opponent must either try to detect exactly where your damage is, or must make wholesale changes to the data (such as removing all small roads altogether). The thief, in trying to cover up your damage, must make a thousand times as much damage. Choose your damage level appropriately so that your level of damage isn't too much but the thief's is. ---- Issue #2, thieves cross-correlating between multiple copies of the data, is a bit more subtle. Here's the scenario: Dow Jones has 10,000 customers, 64 of whom are in a conspiracy to steal and re-sell the newswire. Dow Jones tries various tagging strategies, altering whitespace and word choice individually for each subscriber. The thieves try to cross-correlate between their copies of the text in order to "cancel out" the tags from the copy which they wish to re-sell. Can Dow Jones detect the thieves and cancel their subscriptions? In the discussion below, when Dow Jones "twiddles a bit" of their newswire, they do so by substituting a word's synonym at a chosen location, using a separate (possibly biased) coin flip for each subscriber. Here are the strategies I've considered. Dow Jones strategy: twiddle some bits with probability 0.5. If the thieves use majority vote, each thief will have a reasonably high correlation with the output bits. (In fact, the probability of a match will exceed 50% by approximately the chance of a tie vote among the thieves, which is about 0.8/sqrt(n) where n is the number of thieves. This computation is a bit hairy.) Thief countermeasure: reliably detect which bits are being twiddled (by cross-checking between, say, 64 different subscriptions) and flip a fair coin to determine the output. There's a chance of only 2 in 2^64 that the thieves fail to detect the twiddle. Dow Jones strategy: twiddle some bits with low probability (e.g. p=0.01). Reasonably often, the bit values will be the same for all thieves. If the thieves use the flip-a-coin strategy, we can determine which tag bits they've failed to detect, and identify them that way. Thief countermeasure: use a majority vote. Dow Jones strategy: hybrid of the two. Thief countermeasure: hybrid of the two. Flip a coin if the vote is fairly even, go with the majority if the vote is uneven. For example, get 64 subscriptions, go with the majority vote if fewer than 16 dissenters, flip a fair coin otherwise. This last strategy for the thieves is the one I can't beat. Theoretical help, anyone? -- Marc Ringuette (mnr@cs.cmu.edu)
Can I suggest that any messages posted to cypherpunks start with "Cypher:" in the subject line? The mail from this list is getting mixed in with all my other mail, cause my newsreader (elm) can't sort on "To:" fields. Does anyone else have this problem? Does this idea seem reasonable? JIm C.
Can I suggest that any messages posted to cypherpunks start with "Cypher:" in the subject line? The mail from this list is getting mixed in with all my other mail, cause my newsreader (elm) can't sort on "To:" fields. Does anyone else have this problem? Does this idea seem reasonable? JIm C.
I use the following .forward file to make slocal "sort" my mail based upon the contents of the .maildelivery file below. -- $HOME/.forward -- | /usr/lib/mh/slocal -user nowhere <EOF> You should use something like the following .maildelivery file to tell slocal where to put the messages. -- $HOME/.maildelivery -- # # field "pattern" action "command" # To "cypherpunks@toad.com" file ? Mail/cypherpunks This will file messages directed to cypherpunks to a file in your Elm mail directory, but leave all other messages untouched. You have to then choose the folder "=cypherpunks" to read those messages. NOTE: You need to change the path of slocal to the appropriate path for your system. You can find it with the whereis -b command or the find utility. Am I forgetting anything? Chael Hall -- Chael Hall nowhere@bsu-cs.bsu.edu, 00CCHALL@BSUVC.BSU.EDU (317) 285-3648 after 5 pm EST
Can I suggest that any messages posted to cypherpunks start with "Cypher:" in the subject line? The mail from this list is getting mixed in with all my other mail, cause my newsreader (elm) can't sort on "To:" fields. Does anyone else have this problem? Does this idea seem reasonable? JIm C.
There's a program called "filter" (which I think is part of the elm distribution) that I use to automatically route messages from different mailing lists to separate folders, which can then be read at leisure. Very handy! derek
On Mar 18, 12:05, Brad Huntting wrote:
Subject: Re: Cypher: Subject naming proposal
The mail from this list is getting mixed in with all my other mail, cause my newsreader (elm) can't sort on "To:" fields.
Perhaps you should get a better mail reader (e.g. MH). -- End of excerpt from Brad Huntting
mush will also allow filtering based on more or less whatever you want (e.g. To: fields). Mark -- Mark Henderson markh@wimsey.bc.ca
-----BEGIN PGP SIGNED MESSAGE-----
On Thu, 18 Mar 93 10:39:46 EST, nowhere@bsu-cs.bsu.edu (Chael Hall) said:
Can I suggest that any messages posted to cypherpunks start with "Cypher:" in the subject line? The mail from this list is getting mixed in with all my other mail, cause my newsreader (elm) can't sort on "To:" fields. Does anyone else have this problem? Does this idea seem reasonable? JIm C.
Hall> I use the following .forward file to make slocal "sort" my mail based Hall> upon the contents of the .maildelivery file below. [snip] Hall> NOTE: You need to change the path of slocal to the appropriate Hall> path for your system. You can find it with the whereis -b command or Hall> the find utility. Am I forgetting anything? Erm, only that this apparently appears to pretty much _require_ switching mailreaders to MH. A more transparent solution can be achieved with the 'procmail' package, available from any comp.sources.misc archive. This package allows rule-based filtering on message content, size, and other factors, and can be installed workably with most mailreaders to my knowledge, without requiring much effort. Hall> Chael Hall Hall> -- Hall> Chael Hall Hall> nowhere@bsu-cs.bsu.edu, 00CCHALL@BSUVC.BSU.EDU Hall> (317) 285-3648 after 5 pm EST Crys Rides -----BEGIN PGP SIGNATURE----- Version: 2.2 iQCVAgUBK6kEyJSqD+bQ7So3AQH/fwQAuRsviaD3uoG8VFU6nM2IDz+Nllbc5+KO o3wCYGg7S15skdCjz+p7s97hAJlQ+IKtAdMia0Hya6W4cDOUHJGTlXeMmSXlEKlu 2W9kZN8bAR6D4TkuW0RqMFAzCW0U+87VajKO28IZLSEFGo1KPbFYlVP2eXsi/mPj UND/fuivjzU= =5b+o -----END PGP SIGNATURE-----
Erm, only that this apparently appears to pretty much _require_ switching mailreaders to MH. A more transparent solution can be achieved with the 'procmail' package, available from any comp.sources.misc archive. This package allows rule-based filtering on message content, size, and other factors, and can be installed workably with most mailreaders to my knowledge, without requiring much effort.
I don't know what you mean... The incoming mail ends up in /var/spool/mail (on my system) whether or not you use it. I use ELM as my mailreader and everything works fine. He did say that he is using ELM. To me, this is less effort than FTPing 'procmail.'
Crys Rides
Chael Hall -- Chael Hall nowhere@bsu-cs.bsu.edu, 00CCHALL@BSUVC.BSU.EDU (317) 285-3648 after 5 pm EST
participants (7)
-
Brad Huntting
-
Crys Rides
-
derek@cs.wisc.edu
-
Jim C
-
Marc.Ringuette@GS80.SP.CS.CMU.EDU
-
markh@wimsey.bc.ca
-
nowhere@bsu-cs.bsu.edu