www.nsa-observer.net
https://www.nsa-observer.net/ https://github.com/nsa-observer/ fyi, coderman et al.
On 1/30/15, grarpamp <grarpamp@gmail.com> wrote:
https://www.nsa-observer.net/ https://github.com/nsa-observer/
fyi, coderman et al.
thanks, checking them out. one thing i don't see mentioned is how the OCR was performed. same as Reuters DocumentCloud service, or open source tool, or ? next bigsun update will demonstrate this challenge better, as i am using a handful of techniques for text extraction, character recognition, and annotation, as well. in a sense, this is how the sausage making gets started... (i will see if there is a convenient way i can feed back out again, like to nsa-observer, since bigsun is intended to be operated entirely within hidden services - no public services, especially not github or document cloud) best regards,
n Fri, Jan 30, 2015 at 9:29 PM, coderman <coderman@gmail.com> wrote:
thanks, checking them out. one thing i don't see mentioned is how the OCR was performed. same as Reuters DocumentCloud service, or open source tool, or ?
next bigsun update will demonstrate this challenge better, as i am using a handful of techniques for text extraction, character recognition, and annotation, as well. in a sense, this is how the sausage making gets started...
https://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_so... Or cheap labor from $thirdworld? Crowdsourcing? For that matter, funding from special interests for a dedicated natural language team would probably not be too hard to find if their ROI for input to later analysis was good.
(i will see if there is a convenient way i can feed back out again, like to nsa-observer, since bigsun is intended to be operated entirely within hidden services - no public services, especially not github or document cloud)
I see no problem with running cool projects even exclusively within darknets. Announcements/links will find their way out to clearnet. Those who wish to join or read will do so and be exposed to learning and running some new privacy/crypto tech needed to get to it as a byproduct. It's a win. More people should do it for their related projects. And so long as the darknets can be made to scale, in general.
On Fri, Jan 30, 2015 at 11:40 PM, grarpamp <grarpamp@gmail.com> wrote:
https://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_so...
https://www.google.com/search?q="(pdftotxt|pdf2txt)" http://www.unixuser.org/~euske/python/pdfminer/ Note the "Sudden resurge of interests". As a hack you could just auto iterate display and screenshot of each page/slide of whatever doctype and shove them through ocr for fun.
On 1/30/15, grarpamp <grarpamp@gmail.com> wrote:
... Or cheap labor from $thirdworld? Crowdsourcing?
this is a longer story, but yes, i'm using a handful of all of the above. some are better than others. (and some proprietary tools, as well)
For that matter, funding from special interests for a dedicated natural language team would probably not be too hard to find if their ROI for input to later analysis was good.
i'm no good at funding, but cursory efforts were not productive. part of my problem is focus on SIGINT/NatSec, when general purpose tools would suffice. no one wants to touch the hot potato unless they're already knee deep in the mash.
I see no problem with running cool projects even exclusively within darknets. Announcements/links will find their way out to clearnet. Those who wish to join or read will do so and be exposed to learning and running some new privacy/crypto tech needed to get to it as a byproduct. It's a win. More people should do it for their related projects. And so long as the darknets can be made to scale, in general.
scaling distribution of tens of gigs of reference materials is a challenge. technically it is working, but usability needs some help... (next dist should be easier to mirror) thanks grarpamp! best regards,
On Sat, Jan 31, 2015 at 4:43 AM, coderman <coderman@gmail.com> wrote:
scaling distribution of tens of gigs of reference materials is a challenge. technically it is working, but usability needs some help... (next dist should be easier to mirror)
Moving 10G/day/node to or from clearnet is possible. Posting in the darknet might find you parallel armies of sympathetic nodes willing to help with task.
Just don't forget to use markov chains + blot-width inference to fill in the censored portions. :) On 31/01/15 02:29, coderman wrote:
On 1/30/15, grarpamp <grarpamp@gmail.com> wrote:
https://www.nsa-observer.net/ https://github.com/nsa-observer/
fyi, coderman et al.
thanks, checking them out. one thing i don't see mentioned is how the OCR was performed. same as Reuters DocumentCloud service, or open source tool, or ?
next bigsun update will demonstrate this challenge better, as i am using a handful of techniques for text extraction, character recognition, and annotation, as well. in a sense, this is how the sausage making gets started...
(i will see if there is a convenient way i can feed back out again, like to nsa-observer, since bigsun is intended to be operated entirely within hidden services - no public services, especially not github or document cloud)
best regards,
-- Twitter: @onetruecathal Phone: +353876363185 miniLock: JjmYYngs7akLZUjkvFkuYdsZ3PyPHSZRBKNm6qTYKZfAM peerio.com: Use email or phone. Uses above miniLock key.
Dnia piątek, 30 stycznia 2015 13:17:36 grarpamp pisze:
https://www.nsa-observer.net/ https://github.com/nsa-observer/
Wonder how long it will take GitHub to start taking down such "problematic" projects.
fyi, coderman et al.
Cool. -- Pozdrawiam, Michał "rysiek" Woźniak Zmieniam klucz GPG :: http://rys.io/pl/147 GPG Key Transition :: http://rys.io/en/147
On Sat, Jan 31, 2015 at 03:14:26PM +0100, rysiek wrote:
Wonder how long it will take GitHub to start taking down such "problematic" projects.
and rightly so! wtf do you need js for accessing this content, surely only to get a foothold in the interested parties host. -- otr fp: https://www.ctrlc.hu/~stef/otr.txt
On Sat, Jan 31, 2015 at 03:39:25PM +0100, stef wrote:
On Sat, Jan 31, 2015 at 03:14:26PM +0100, rysiek wrote:
Wonder how long it will take GitHub to start taking down such "problematic" projects.
and rightly so! wtf do you need js for accessing this content, surely only to get a foothold in the interested parties host.
-- otr fp: https://www.ctrlc.hu/~stef/otr.txt
indeed, why js? this certainly can be implemented server side or static. mozilla (tm) (r) (inc) pissed off the creator of js (BE) due to made up gay scandal shortly after he came in power.
participants (6)
-
Cathal Garvey
-
coderman
-
Georgi Guninski
-
grarpamp
-
rysiek
-
stef