Swell initiatives by all the Snowden distributors. Except most fall prey to PDF manipulation of tagging, implanting, tracking, by willful intent or by technical ignorance. It is virtually impossible to sanitize PDFs due to Adobe's inherent design to meticulously spy on use of its products as well as deluded user attempts to hide who, when, what, how by users from creation to modification to wiping to stego to signing to forging to stinging. Image or accessible-text or other PDF formats, locked, redacted, watermarked, et al. Well behind and below the metadata Adobe allows users and abusers to see, lurks Adobe's advanced persistent meta-meta to aim for NSA/Tor-multi-level obscurantism by which accessible meta is used to hide inaccessible. Below dark is darker, ever darkening as new tools are developed to pry farther into the less visible and off-oscilloscopic spectrum. DoD issued today a short directive on the Center for Countermeasures and Counter-Countermeasures (including cyberweapons like PDFs). Presume NSA, aided by Adobe, has tracked, will track, Snowden's material in all its iterations from the time he snatched it to the latest distributor, consumer and secret archiver in Oahu, Maryland, Hong Kong, Berlin, Rio, NYC, DC, online and off. PDFs (and DOCs) are more treacherous than log files, backdoors, 0-days, APTs, what have you due to their popularity. HTM and TXT are much safer. Courier fonts are safer than other fonts, especially those promoted by Adobe and others who use fonts to spy (all rationalized as protection of IP -- ie, comparable to natsec). At 09:29 PM 1/30/2015, you wrote:
On 1/30/15, grarpamp <grarpamp@gmail.com> wrote:
https://www.nsa-observer.net/ https://github.com/nsa-observer/
fyi, coderman et al.
thanks, checking them out. one thing i don't see mentioned is how the OCR was performed. same as Reuters DocumentCloud service, or open source tool, or ?
next bigsun update will demonstrate this challenge better, as i am using a handful of techniques for text extraction, character recognition, and annotation, as well. in a sense, this is how the sausage making gets started...
(i will see if there is a convenient way i can feed back out again, like to nsa-observer, since bigsun is intended to be operated entirely within hidden services - no public services, especially not github or document cloud)
best regards,