On Wed, 2013-04-24 at 12:12 -0500, Bryan Bishop wrote:
But if we need to start stripping journal names, paper titles, author names, etc., from pdfs, I am not sure how we would re-assemble that information later, because anyone would be able to re-assemble that information and would to find the "science violators".
Why not just use gpg to encrypt all the PDFs, using the hostname of the mirror of the password? This makes things slightly more difficult for us, but not impossible, and imposes a large cost on bots trying to enforce the Science Interdict. There are any number of small programmatic transformations we can apply to PDFs that make them not obviously PDFs to bots, but obviously PDFs to humans. Ultimately, this is how we'll have to go, because it's pretty easy to say that diyhpl.us/text.pdf is the same as proprietaryjournal.com/text.pdf, even without watermarks. -- Sent from Ubuntu [demime 1.01d removed an attachment of type application/pgp-signature which had a name of signature.asc]