n Fri, Jan 30, 2015 at 9:29 PM, coderman <coderman@gmail.com> wrote:
thanks, checking them out. one thing i don't see mentioned is how the OCR was performed. same as Reuters DocumentCloud service, or open source tool, or ?
next bigsun update will demonstrate this challenge better, as i am using a handful of techniques for text extraction, character recognition, and annotation, as well. in a sense, this is how the sausage making gets started...
https://en.wikipedia.org/wiki/Comparison_of_optical_character_recognition_so... Or cheap labor from $thirdworld? Crowdsourcing? For that matter, funding from special interests for a dedicated natural language team would probably not be too hard to find if their ROI for input to later analysis was good.
(i will see if there is a convenient way i can feed back out again, like to nsa-observer, since bigsun is intended to be operated entirely within hidden services - no public services, especially not github or document cloud)
I see no problem with running cool projects even exclusively within darknets. Announcements/links will find their way out to clearnet. Those who wish to join or read will do so and be exposed to learning and running some new privacy/crypto tech needed to get to it as a byproduct. It's a win. More people should do it for their related projects. And so long as the darknets can be made to scale, in general.