[liberationtech] data mine the snowden files [was: open the snowden files]

M. C. McGrath shidash at shidash.com
Tue Jul 8 18:53:33 PDT 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I've been working on tools to do exactly this- to make it easier for
journalists to rapidly analyze documents and combine different docs
and datasets (http://transparencytoolkit.org/). This mostly includes
tools for collecting data (uploading docs and getting them in a
standard format, scraping pages, pulling data from APIs), filtering
through docs (search/browsing tools, entity extraction, combining and
crossreferencing, keyword extraction), and visualizing info (in maps,
timelines, network graphs).

Where possible, I've been basing these off of existing source
software, but I also frequently build and heavily modify tools. I'd
love to hear what suggestions people have for tools to make or use cases.

More info-
Demo: http://demo.transparencytoolkit.org
Analysis Platform:
https://github.com/TransparencyToolkit/Transparency-Toolkit
All Tools: https://github.com/transparencytoolkit
Network graph generated with TT from LinkedIn profiles mentioning NSA
surveillance programs: http://transparencytoolkit.org/nsanetwork.html
Article about the above:
http://america.aljazeera.com/articles/2014/5/29/nsa-contractors-linkedinprofiles.html
Thoughts on how to use tools like this effectively:
https://www.theengineroom.org/how-to-find-and-mash-online-info-for-anticorruption/

On 07/08/2014 03:27 PM, grarpamp wrote:
> On Tue, Jul 8, 2014 at 4:11 PM, coderman <coderman at gmail.com>
> wrote:
>> On Tue, Jul 8, 2014 at 1:05 PM, Griffin Boyce
>> <griffin at cryptolab.net> wrote:
>>> One approach is to take the existing public data, make some
>>> assumptions (educated guesses) and do additional research on
>>> top of that. It's what I'm doing right now. It's also what led
>>> to the original cointelpro revelations. Before the follow-up
>>> research, it was a meaningless acronym.
>>> 
>>> Find, extrapolate, expand.
>> 
>> this is the type of effort i was hoping to see undertaken.
>> 
>> when you say "additional research", is this organic or
>> structured? tool assisted or old skewl?
>> 
>> i too have been building up some terms and technologies, but yet
>> to put it into any structured format with context, as part of my
>> post is to see how others are handling the vast complexity and
>> extensive compartmentalization embodied in the leaks to date.
>> 
>> i also would like to pursue this research anonymously, on hidden 
>> services rather than public sites or email.
> 
> To do any of this you will need to collect all the releases of
> docs and images to date, in their original format (not AP
> newsspeak), in one place. Then dedicate much time to normalizing,
> convert to one format and import into tagged document store, etc.
> Yes, this could be hosted on the darknet.
> 

- -- 
M. C. McGrath
Transparency Toolkit | http://transparencytoolkit.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Icedove - http://www.enigmail.net/

iF4EAREIAAYFAlO8oJ0ACgkQHKENpovrR8UKmAEAhY06O24ReM52Us56SBSJZDu+
JKIjm0Juw+lG43vsxAQA/2lIAIipDU9BfYyA7+G9Uv0pwTzxhC9Ubnc7Yyd4H715
=uM9l
-----END PGP SIGNATURE-----



More information about the cypherpunks mailing list