data mine the snowden files [was: open the snowden files]

newer
Cypherpunks for Office [Edu/corp...

older
Universities giving personal...

coderman

8 Jul 2014 8 Jul '14

7:08 p.m.

On Sat, Jul 5, 2014 at 11:29 AM, Geert Lovink <geert@desk.nl> wrote:

...

... the snowden files are of public interest. but only a small circle of people is able to access, read, analyze, interpret and publish them. and only a very small percentage of those files has been made available to the public...

what can be done about this situation? are we able to find a way to "open" this data? and in the course of this create a modell for future leaks? .. prior to my intervention harding had already hinted at some very obvious limitations of the ongoing investigation, alluding to various reasons why those "few lucky ones" are incapable to deal with the investigation challenge in an approriate manner: "we are not technical experts" or "after two hours your eyes pop out". inspite of this, harding seemed unprepared to refelect the possibility to open the small circle of analysts dealing with the snowden files.

an impasse of extremes, a full or limited dump off the table. let's find a middle ground. how best to proceed?

...

* last but not least: one should work out a concept/model for transferring those files into the public domain -- taking also into account the obvious problems of "security" and "government pressure".

it would be great of we could start a debate about in order to build a case for the future of handling big data leaks in a more democratic and sustainable manner.

very great indeed. what kind of tools would make the journalists involved more effective and productive? 1. using the leaks currently published, devise a framework for "data mining" the leak documents, aka, generating metadata from the data and operating various matches and relevance across the metadata to narrow the search and aggregate related efforts or technologies across their compartmentalized worlds. 2. #1 requires that there is an index of special terms, techniques, suppliers, code names, algorithms, etc. that used to generate the metadata for deeper search and tie to general themes of surveillance. 3. extrapolating from current leaks, also look toward recent advancements and specific technical tell tales of interest. doping silicon as tailored access technique? this could refer to compromised runs of security processors for desired targets. etc. 4. justifying technical detail specifically. we have seen so little technical detail of the source code / hardware design level. how best to justify source code - explaining that the language choice, the nature of the algorithms, the structure of the distributed computing upon which it runs all conveys critical technical details important to understand what part of our technologies are compromised, and guiding the fixes required to protect against such compromises? in short, it would behoove us to build tools to make the journalists more effective, rather than bitch about not being included in the inner circle. (sadly, many good knowledge discovery tools are proprietary and applied to open source intelligence) what types of features would you want such a leak-assistant software to have? what types of existing tools, if any, would provide these capabilities? best regards,

Show replies by date

Griffin Boyce

8 Jul 8 Jul

8:05 p.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

One approach is to take the existing public data, make some assumptions (educated guesses) and do additional research on top of that. It's what I'm doing right now. It's also what led to the original cointelpro revelations. Before the follow-up research, it was a meaningless acronym. Find, extrapolate, expand. ~ Griffin -- Sent from my tracking device. Please excuse brevity and cat photos.

coderman

8:11 p.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

On Tue, Jul 8, 2014 at 1:05 PM, Griffin Boyce <griffin@cryptolab.net> wrote:

...

One approach is to take the existing public data, make some assumptions (educated guesses) and do additional research on top of that. It's what I'm doing right now. It's also what led to the original cointelpro revelations. Before the follow-up research, it was a meaningless acronym.

Find, extrapolate, expand.

hi Griffin! this is the type of effort i was hoping to see undertaken. when you say "additional research", is this organic or structured? tool assisted or old skewl? i too have been building up some terms and technologies, but yet to put it into any structured format with context, as part of my post is to see how others are handling the vast complexity and extensive compartmentalization embodied in the leaks to date. i also would like to pursue this research anonymously, on hidden services rather than public sites or email. best regards,

Griffin Boyce

10:11 p.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

On July 8, 2014 4:11:44 PM EDT, coderman <coderman@gmail.com> wrote:

...

hi Griffin!

this is the type of effort i was hoping to see undertaken.

Me too ^_^ eventually I realized I'd have to do it myself if I wanted more info on Topic X. I obviously don't have access to the source, but there are some clear ways to expand on the material that's been released.

...

when you say "additional research", is this organic or structured? tool assisted or old skewl?

Right now, the aspect I'm researching requires lots of structured research, but fully expect to come across something unexpected (a specific sourcing pattern, perhaps). Manual desk research is the new hotness. Well... maybe not. ;) It helps that I'm really good at it, so it doesn't take as much drudgery. Once collected, some things are trimmed and cleaned up using custom tools. But data collection is all manual.

...

i too have been building up some terms and technologies, but yet to put it into any structured format with context, as part of my post is to see how others are handling the vast complexity and extensive compartmentalization embodied in the leaks to date.

Nice! :D I'd love to hear more about your conclusions sometime. I started by looking at one narrow outcome of the NSA's work that I find horribly disruptive to the ecosystem around my work. Now my task is to find further proof of this activity using unclassified source material and possibly patterns within their work in this area.

...

i also would like to pursue this research anonymously, on hidden services rather than public sites or email.

Indeed. Lots of excellent reasons to be light on detail in these types of public forums. ~ Griffin -- Sent from my tracking device. Please excuse brevity and cat photos.

grarpamp

10:27 p.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

On Tue, Jul 8, 2014 at 4:11 PM, coderman <coderman@gmail.com> wrote:

...

On Tue, Jul 8, 2014 at 1:05 PM, Griffin Boyce <griffin@cryptolab.net> wrote:

...
One approach is to take the existing public data, make some assumptions (educated guesses) and do additional research on top of that. It's what I'm doing right now. It's also what led to the original cointelpro revelations. Before the follow-up research, it was a meaningless acronym.

Find, extrapolate, expand.

this is the type of effort i was hoping to see undertaken.

when you say "additional research", is this organic or structured? tool assisted or old skewl?

i too have been building up some terms and technologies, but yet to put it into any structured format with context, as part of my post is to see how others are handling the vast complexity and extensive compartmentalization embodied in the leaks to date.

i also would like to pursue this research anonymously, on hidden services rather than public sites or email.

To do any of this you will need to collect all the releases of docs and images to date, in their original format (not AP newsspeak), in one place. Then dedicate much time to normalizing, convert to one format and import into tagged document store, etc. Yes, this could be hosted on the darknet.

M. C. McGrath

9 Jul 9 Jul

1:53 a.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 I've been working on tools to do exactly this- to make it easier for journalists to rapidly analyze documents and combine different docs and datasets (http://transparencytoolkit.org/). This mostly includes tools for collecting data (uploading docs and getting them in a standard format, scraping pages, pulling data from APIs), filtering through docs (search/browsing tools, entity extraction, combining and crossreferencing, keyword extraction), and visualizing info (in maps, timelines, network graphs). Where possible, I've been basing these off of existing source software, but I also frequently build and heavily modify tools. I'd love to hear what suggestions people have for tools to make or use cases. More info- Demo: http://demo.transparencytoolkit.org Analysis Platform: https://github.com/TransparencyToolkit/Transparency-Toolkit All Tools: https://github.com/transparencytoolkit Network graph generated with TT from LinkedIn profiles mentioning NSA surveillance programs: http://transparencytoolkit.org/nsanetwork.html Article about the above: http://america.aljazeera.com/articles/2014/5/29/nsa-contractors-linkedinprof... Thoughts on how to use tools like this effectively: https://www.theengineroom.org/how-to-find-and-mash-online-info-for-anticorru... On 07/08/2014 03:27 PM, grarpamp wrote:

...

On Tue, Jul 8, 2014 at 4:11 PM, coderman <coderman@gmail.com> wrote:

...
On Tue, Jul 8, 2014 at 1:05 PM, Griffin Boyce <griffin@cryptolab.net> wrote:

...
One approach is to take the existing public data, make some assumptions (educated guesses) and do additional research on top of that. It's what I'm doing right now. It's also what led to the original cointelpro revelations. Before the follow-up research, it was a meaningless acronym.

Find, extrapolate, expand.

this is the type of effort i was hoping to see undertaken.

when you say "additional research", is this organic or structured? tool assisted or old skewl?

i too have been building up some terms and technologies, but yet to put it into any structured format with context, as part of my post is to see how others are handling the vast complexity and extensive compartmentalization embodied in the leaks to date.

i also would like to pursue this research anonymously, on hidden services rather than public sites or email.

To do any of this you will need to collect all the releases of docs and images to date, in their original format (not AP newsspeak), in one place. Then dedicate much time to normalizing, convert to one format and import into tagged document store, etc. Yes, this could be hosted on the darknet.

- -- M. C. McGrath Transparency Toolkit | http://transparencytoolkit.org -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iF4EAREIAAYFAlO8oJ0ACgkQHKENpovrR8UKmAEAhY06O24ReM52Us56SBSJZDu+ JKIjm0Juw+lG43vsxAQA/2lIAIipDU9BfYyA7+G9Uv0pwTzxhC9Ubnc7Yyd4H715 =uM9l -----END PGP SIGNATURE-----

coderman

2:04 p.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

On Tue, Jul 8, 2014 at 3:27 PM, grarpamp <grarpamp@gmail.com> wrote:

...

... To do any of this you will need to collect all the releases of docs and images to date, in their original format (not AP newsspeak), in one place. Then dedicate much time to normalizing, convert to one format and import into tagged document store, etc. Yes, this could be hosted on the darknet.

indeed. i will also be hosting the complete cryptome archive on hidden site, as it too is part of this corpus to feed into a normalization and extraction engine of great justice. i am using the various python image processing libraries to accomplish this but any language or tool could be useful. i had hoped to distribute the cryptome archives further during the Paris hackfest, alas, unexpected events conspired otherwise. anyone who would like to host mirrors is welcome to tell me how they anticipate mirroring ~30G of data as quickly as possible. :)

edhelas

2:58 p.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

What about a Torrent ? We can easily share the magnet everywhere (Reddit, Twitter…). On mer., juil. 9, 2014 at 4:04 , coderman <coderman@gmail.com> wrote:

...

On Tue, Jul 8, 2014 at 3:27 PM, grarpamp <grarpamp@gmail.com> wrote:

...
... To do any of this you will need to collect all the releases of docs and images to date, in their original format (not AP newsspeak), in one place. Then dedicate much time to normalizing, convert to one format and import into tagged document store, etc. Yes, this could be hosted on the darknet.

indeed. i will also be hosting the complete cryptome archive on hidden site, as it too is part of this corpus to feed into a normalization and extraction engine of great justice. i am using the various python image processing libraries to accomplish this but any language or tool could be useful.

i had hoped to distribute the cryptome archives further during the Paris hackfest, alas, unexpected events conspired otherwise.

anyone who would like to host mirrors is welcome to tell me how they anticipate mirroring ~30G of data as quickly as possible. :)

John Young

4:17 p.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

Tag the Cryptome Archive: "This is a trap, witting and unwitting. Do not use it or use at own risk. Source and this host is out to pwon and phuck you in complicity with global Internet authorities. Signed Batshit Cryptome and Host, 9 July 2014, 12:16ET." At 10:58 AM 7/9/2014, you wrote:

...

What about a Torrent ? We can easily share the magnet everywhere (Reddit, Twitter).

On mer., juil. 9, 2014 at 4:04 , coderman <codderman@gmail.com> wrote:

...
On Tue, Jul 8, 2014 at 3:27 PM, grarpamp <grarpamp@gmail.com> wrote: ... To do any of this you will need to collect all the releases of docs and images to date, in their original format (not AP newsspeak), in one place. Then dedicate much time to normalizing, convert to one format and import into tagged document store, etc. Yes, this could be hosted on the darknet.

indeed. i will also be hosting the complete cryptome archive on hidden site, as it too is part of this corpus to feed into a normalization and extraction engine of great justice. i am using the various python image processing libraries to accomplish this but any language or tool could be useful. i had hoped to distribute the cryptome archives further during the Paris hackfest, alas, unexpected events conspired otherwise. anyone who would like to host mirrors is welcome to tell me how they anticipate mirroring ~30G of data as quickly as possible. :)

grarpamp

10 Jul 10 Jul

12:38 a.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

On Wed, Jul 9, 2014 at 12:17 PM, John Young <jya@pipeline.com> wrote:

...

Tag the Cryptome Archive: "This is a trap, witting and unwitting. Do not use it or use at own risk. Source and this host is out to pwon and phuck you in complicity with global Internet authorities. Signed Batshit Cryptome and Host, 9 July 2014, 12:16ET."

Cryptome and JYA's curation, words, and work are important and a monument in their own right. Nuff said. As with other works in this class, I agree with this and other preservation, distribution and downstream analysis efforts. And with carrying whatever tag and preface he so wishes to be carried with them. Please ensure such frontmatter is attached.

coderman

1:35 a.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

On Wed, Jul 9, 2014 at 9:17 AM, John Young <jya@pipeline.com> wrote:

...

Tag the Cryptome Archive: "This is a trap, witting and unwitting. Do not use it or use at own risk. Source and this host is out to pwon and phuck you in complicity with global Internet authorities. Signed Batshit Cryptome and Host, 9 July 2014, 12:16ET."

see attached. onion before torrent; rest TBD. also: http://cryptome.org/donations.htm best regards,

coderman

11 Jul 11 Jul

12:29 a.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

coderman

12 Jul 12 Jul

5:19 a.m.

New subject: [liberationtech] data mine the snowden files [was: open the snowden files]

added example privoxy config as http_proxy to Tor, add sig note for Update 13. no further updates on list; contact direct if issues encountered. best regards,

4042

Age (days ago)

4046

Last active (days ago)

List overview

Download

12 comments

6 participants

participants (6)

coderman
edhelas
grarpamp
Griffin Boyce
John Young
M. C. McGrath