fingerprinting traffic at ISP for big content

8 Jun 2010

      Recent events related to "big content" pursuing individual file sharers 
based on ISP logs are _very interesting_.

My first thought is that this usage is tracked via filename - you are 
guilty until proven otherwise if bittorrent traffic indicates a filename 
that matches [Hh][Uu][Rr][Tt].[Ll][Oo][Cc][Kk][Ee][Rr].

But this seems shaky to me ... certainly bittorrent, already demonized, 
combined with an incriminating filename can shake loose a quick 
settlement, but in the long run this seems unsustainable.  Maybe they 
extend it to HTTP and FTP, etc., but you've still got an unknown file, 
interesting only in its name.

On the other end of things, the ISPs cannot be saving all of the data.

But what about fingerprinting it all ?  Let's think of a traffic backbone 
... at comcast, for instance ... say 2 gigabits/s aggregate traffic ... 
the hashes themselves of all files (or, let's say, all files 100 MB in 
size or larger) won't take up much storage, but this is a non-trivial 
amount of CPU.

a) Does anyone know what method is being used for these pursuits ?  I'm 
assuming a low tech "parse filenames in unencrypted BT traffic" but I 
haven't heard any details...

b) Once lawyers and ISPs collude to fully exploit this "revenue source", 
what is a reasonable course of action ?  Can they hash all files at that 
rate of data transfer ?  I'm wondering if that investment would be worth 
the settlements it produces...

fingerprinting traffic at ISP for big content

John Case