Re: www.nsa-observer.net
Swell initiatives by all the Snowden distributors. Except most fall prey to PDF manipulation of tagging, implanting, tracking, by willful intent or by technical ignorance. It is virtually impossible to sanitize PDFs due to Adobe's inherent design to meticulously spy on use of its products as well as deluded user attempts to hide who, when, what, how by users from creation to modification to wiping to stego to signing to forging to stinging. Image or accessible-text or other PDF formats, locked, redacted, watermarked, et al. Well behind and below the metadata Adobe allows users and abusers to see, lurks Adobe's advanced persistent meta-meta to aim for NSA/Tor-multi-level obscurantism by which accessible meta is used to hide inaccessible. Below dark is darker, ever darkening as new tools are developed to pry farther into the less visible and off-oscilloscopic spectrum. DoD issued today a short directive on the Center for Countermeasures and Counter-Countermeasures (including cyberweapons like PDFs). Presume NSA, aided by Adobe, has tracked, will track, Snowden's material in all its iterations from the time he snatched it to the latest distributor, consumer and secret archiver in Oahu, Maryland, Hong Kong, Berlin, Rio, NYC, DC, online and off. PDFs (and DOCs) are more treacherous than log files, backdoors, 0-days, APTs, what have you due to their popularity. HTM and TXT are much safer. Courier fonts are safer than other fonts, especially those promoted by Adobe and others who use fonts to spy (all rationalized as protection of IP -- ie, comparable to natsec). At 09:29 PM 1/30/2015, you wrote:
On 1/30/15, grarpamp <grarpamp@gmail.com> wrote:
https://www.nsa-observer.net/ https://github.com/nsa-observer/
fyi, coderman et al.
thanks, checking them out. one thing i don't see mentioned is how the OCR was performed. same as Reuters DocumentCloud service, or open source tool, or ?
next bigsun update will demonstrate this challenge better, as i am using a handful of techniques for text extraction, character recognition, and annotation, as well. in a sense, this is how the sausage making gets started...
(i will see if there is a convenient way i can feed back out again, like to nsa-observer, since bigsun is intended to be operated entirely within hidden services - no public services, especially not github or document cloud)
best regards,
On Jan 31, 2015, at 8:59 AM, John Young <jya@pipeline.com> wrote:
Swell initiatives by all the Snowden distributors. Except most fall prey to PDF manipulation of tagging, implanting, tracking, by willful intent or by technical ignorance.
It is virtually impossible to sanitize PDFs due to Adobe's inherent design to meticulously spy on use of its products as well as deluded user attempts to hide who, when, what, how by users from creation to modification to wiping to stego to signing to forging to stinging. Image or accessible-text or other PDF formats, locked, redacted, watermarked, et al.
Pardon my ignorance about this, and I will do my own research, but do these hidden formattings/stego/call-home functions disappear, get mutilated, become broken when ‘converting’ such PDF documents to other document types via use of many ‘conversion’ tools (Calibre comes to mind instantly) or are these embedded organisms a persistent across any automated conversion routine? Cheers, Benjamin
On 1/31/15, Benjamin Brewer <bbrewer@littledystopia.net> wrote:
... Pardon my ignorance about this, and I will do my own research, but do these hidden formattings/stego/call-home functions disappear, get mutilated, become broken when ‘converting’ such PDF documents to other document types
we can wax lyrical about all the ways to sanitize a boundary through constraint, perhaps twice over, to be sure? that said, consider a Qubes OS setup where conversion between formats (app domains) was always to least complicated, most easy to verify well formed, even constraint through omission type simplifications, then a PDF to plain-text 80 column by 42 lines per page fixed width ASCII printable only could probably be interpreted into sentences that would be a way to collaborate separately without excessively leaking information among participants, maybe. in other words, PDFs and similar rich, obfuscated types are the adversaries playground. does this mean all PDFs are compromised? Of course not. But if you're a target, a specific PDF of specific structure could very well be an effective honey token and target you precisely.
... via use of many ‘conversion’ tools (Calibre comes to mind instantly) or are these embedded organisms a persistent across any automated conversion routine?
consider a watermark, that resized half, still persists. this is the kind of meta leval manipulation of structure you may see in a rich document (PDF) that could still persist in some transformations. in other words, it depends on your threat model - who is tainting your documents in-line, silently, without your awares, and how complicated the formats and resulting transformations. as another example, this is why referencing even simplified subsets of text by a self certifying identifier, like afb1e384e450d644703ad96cdfe9f728be509854388687eb65b7c622e2f798a9 , e.g. bigsundaawafn36e.onion/shid/afb/1e3/afb1e384e450d644..5b7c622e2f798a9 , or http://sunshineeevvocqr.onion/bigsun/raw/afb1e384e450d644..5b7c622e2f798a9 which is the same paragraph in ascii no matter PDF or Word or HTML origin simplified to text paragraph. then, mutually un-trusting individuals collaborating from a distance, can use this shared address space as the base for cooperation. if that doesn't make sense, i will explain it better, later, :) best regards,
and yes, some absolutely call home. e.g. the embedded tracking pixels and kin. you should always load complex formats from a safe container without network access! (if you care) these are also the most easily stripped out from a simple format conversion, as well. (pdf2txt, etc.) best regards,
On Sat, Jan 31, 2015 at 04:06:13PM -0800, coderman wrote:
via use of many ‘conversion’ tools (Calibre comes to mind instantly) or are these embedded organisms a persistent across any automated conversion routine?
consider a watermark, that resized half, still persists. this is the kind of meta leval manipulation of structure you may see in a rich document (PDF) that could still persist in some transformations.
there was this laywer in .no who lost his license because he leaked photos of anders behring brevik to the press. and police watermarked a set of bait-photos and gave it to the laywers of the families with dead kids. the press made a photo of the pictures, printed the photo in a test news paper, took a photo again, and printed that. so a lot of adc-dac conversions in betweeen. the size also got significantly smaller. yet the watermark was clearly identifiable. it turned out later, that this was some kind of photoshop plugin, which is "primarily for tracking copyright violations" -- otr fp: https://www.ctrlc.hu/~stef/otr.txt
there was this laywer in .no who lost his license because he leaked photos of anders behring brevik to the press. and police watermarked a set of bait-photos and gave it to the laywers of the families with dead kids. the press made a photo of the pictures, printed the photo in a test news paper, took a photo again, and printed that. So a lot of adc-dac conversions in betweeen. the size also got significantly smaller. yet the watermark was clearly identifiable. it turned out later, that this was some kind of photoshop plugin, which is "primarily for tracking copyright violations"
This is the classic "Barium Meal Test", a highly effective way to find a mole. I have been meaning for some time to write a quick pythons script which implements the Barium Meal Test on plain text, perhaps even in a distributed way; by making common misspellings, by replacing whitespace with unicode equivalents (bit too obvious?), by making synonymous punctuation modifications - a dash rather than a semicolon as in this sentence couplet, for example. Part of the idea is to help identify "moles" in follower networks of "private users" in P2P social networks, when people "retweet" private messages. If you could divide your followers into groups and give each group a different barium meal'd message, then after a few "leaks" you'd be able to identify likely leakers. The other part was to point out how easy such identifying substitutions are to make, and to make people acutely aware of the risks involved in sharing potentially watermarked information. On 01/02/15 09:41, stef wrote:
On Sat, Jan 31, 2015 at 04:06:13PM -0800, coderman wrote:
via use of many ‘conversion’ tools (Calibre comes to mind instantly) or are these embedded organisms a persistent across any automated conversion routine?
consider a watermark, that resized half, still persists. this is the kind of meta leval manipulation of structure you may see in a rich document (PDF) that could still persist in some transformations.
there was this laywer in .no who lost his license because he leaked photos of anders behring brevik to the press. and police watermarked a set of bait-photos and gave it to the laywers of the families with dead kids. the press made a photo of the pictures, printed the photo in a test news paper, took a photo again, and printed that. so a lot of adc-dac conversions in betweeen. the size also got significantly smaller. yet the watermark was clearly identifiable. it turned out later, that this was some kind of photoshop plugin, which is "primarily for tracking copyright violations"
-- Twitter: @onetruecathal Phone: +353876363185 miniLock: JjmYYngs7akLZUjkvFkuYdsZ3PyPHSZRBKNm6qTYKZfAM peerio.com: Use email or phone. Uses above miniLock key.
On 1/31/15, coderman <coderman@gmail.com> wrote:
... as another example, this is why referencing even simplified subsets of text by a self certifying identifier, like afb1e384e450d644703ad96cdfe9f728be509854388687eb65b7c622e2f798a9 , e.g. bigsundaawafn36e.onion/shid/afb/1e3/afb1e384e450d644..5b7c622e2f798a9 , or http://sunshineeevvocqr.onion/bigsun/raw/afb1e384e450d644..5b7c622e2f798a9 which is the same paragraph in ascii no matter PDF or Word or HTML origin simplified to text paragraph.
this text is: And I'll go one further. Everything's secret. I mean, I got an e-mail saying, "Merry Christmas." It carried a Top Secret NSA classification marking. The easy option is to classify everything. This is an Agency that for the most of its existence was well served by not having a public image. When the nation felt its existence was threatened, it was willing to cut agencies like NSA quite a bit of slack. But as that threat perception decreases, there is a natural tendency to say, "Now, tell me again what those guys do?" And, therefore, the absence of a public image seems to be less useful today than it was 25 years ago. I don't think we can survive without a public image. (U//FOUO) is should have included at first, as odd, opaque links without context a entropy prank. [ https://twitter.com/nickm_tor/status/549651166834225153 :P ]
Depends on the converter, whether it keeps the Adobe spying features witting or unwitting -- which it may be aware of or not. And whether it has a deal with Adobe to retain disguised. Adode hidden code is quite devious -- for example it may remain hidden to use the converted version as a host germ carrier to propagate itself, following the bio model in our guts to use feces as fertilizer. Be careful about using free products, such as Adobe Reader and free converters of formats of all kinds. They often contain germs similar to the way NSA and other spies implant germs in innocuous programs and platforms -- "free" is as devious as "open" to those who exploit public trust in freedom and openness. Journalism a prime exploiter under the brand of freedom of the press to exploit with constitutional protection. Gmail is one of the most notorious germ transmitters. Tor not quite as evil, but less because newer than the Internet itself, the Internet Archive, Wikipedia, social media, PGP, and many more which may have had noble origins but have been adopted (and bought) by converters of good to evil -- most readily by government contracts, vulture capitalism, desparately broke and in debt, entrapment and coercion by law enforement -- or all of them. This list is beyond good and evil, thus spoke Zarathrusta, aka TCM, JG, EH. Though those gods are dead. Benjamin Brewer wrote:
On Jan 31, 2015, at 8:59 AM, John Young <<https://cpunks.org/mailman/listinfo/cypherpunks>jya at pipeline.com> wrote:
Swell initiatives by all the Snowden distributors. Except most fall prey to PDF manipulation of tagging, implanting, tracking, by willful intent or by technical ignorance.
It is virtually impossible to sanitize PDFs due to Adobe's inherent design to meticulously spy on use of its products as well as deluded user attempts to hide who, when, what, how by users from creation to modification to wiping to stego to signing to forging to stinging. Image or accessible-text or other PDF formats, locked, redacted, watermarked, et al.
Pardon my ignorance about this, and I will do my own research, but do these hidden formattings/stego/call-home functions disappear, get mutilated, become broken when 'converting' such PDF documents to other document types via use of many 'conversion' tools (Calibre comes to mind instantly) or are these embedded organisms a persistent across any automated conversion routine? Cheers, Benjamin
| Depends on the converter, whether it keeps the Adobe spying | features witting or unwitting -- which it may be aware of or not. | And whether it has a deal with Adobe to retain disguised. | ... John, you know this I'm sure, but for the record the highest security places use sacrificial machines to receive e-mail and the like, to print said transmissions to paper, and then those (sacrificial) machines are sacrificed, which is to say they are reloaded/rebooted. Per message. The printed forms then cross an air gap and those are scanned before transmission to a final destination on networks of a highly controlled sort. I suspect, but do not know, that the sacrificial machines are thoroughly instrumented in the countermeasure sense. For the entities of which I speak, the avoidance of silent failure is taken seriously -- which brings us 'round to your (and my) core belief: The sine qua non goal of security engineering is "No Silent Failure." --dan
On Tue, Feb 3, 2015 at 11:43 AM, <dan@geer.org> wrote:
(sacrificial) machines are sacrificed, which is to say they are reloaded/rebooted. Per message.
Network booting a known image is common. Putting the print system in hardware is possible too.
the sacrificial machines are thoroughly instrumented in the countermeasure sense. ... silent failiure
Validation of correct operation, and detection, in face of evil input seems much harder... any and all change to memory dump, files, firmware. All soft parts would need reinitialized. Even becoming recursively expensive. All for a printer on the don't care side of the air gap? Doubtful so long as it passes test vectors. Your opponents highest secrets are historically not likely to come to you embedded in a freaknasty pdf, but on foot. That may be changing [1]. Either way, sometimes nothing beats a roomful of human transcriptionists, translators and auditors with typewriters. [1] Many a gem may even flow through each side's postmaster@ mail.
On 01/31/2015 06:59 AM, John Young wrote:
Swell initiatives by all the Snowden distributors. Except most fall prey to PDF manipulation of tagging, implanting, tracking, by willful intent or by technical ignorance.
<much cool stuff snipped> I don't see this as an issue for processors and distributors, as long as their OPSEC is adequate. We may prudently assume that Snowden's originals were tagged, watermarked, implanted, and so on. Given that, I trust that everyone working with the documents has behaved accordingly. If they haven't, it's all too likely that identities and relationships have been inadvertently been revealed. Son las cosas de la vida. But for casual enthusiasts and the general public, this could be a serious issue. Even if documents were obtained securely, they could phone home. Scans by anti-malware apps could be uploaded to servers. Cloud-backup providers might look for them. ... How might one prevent that? What comes to mind is a Tor hidden-service site that serves scrubbed images, and doesn't readily permit downloads. While OCR would be essential in processing documents, serving text arguably puts users at risk. Maybe that's obvious.
participants (8)
-
Benjamin Brewer
-
Cathal Garvey
-
coderman
-
dan@geer.org
-
grarpamp
-
John Young
-
Mirimir
-
stef