Memex Oil Gush

stef s at ctrlc.hu
Mon Apr 20 07:46:23 PDT 2015


On Mon, Apr 20, 2015 at 10:20:28AM -0400, grarpamp wrote:
> Some memex bits now open sourced...
> 
> http://www.forbes.com/sites/thomasbrewster/2015/04/17/darpa-nasa-and-partners-show-off-memex/

> TJBatchExtractor is what’s going open source today. It allows a user to
> extract data, such as a name, organisation or location, from advertisements.

this sounds interesting, there was open-calais so far from reuters which did
this, but only as a centralized service, if gratis, or you could build your
own corpuses if your domain is not covered by the widely available ones.
however there is lot's of problems with non-english names, for evaluation of
such entity-extractors i recommend to test them with some data set containing
eu public officials, with names in greek, bulgarian and some latin-speaking
country and some slavic speaking one and you have something that can confuse
such entity extraction quite sufficiently. i guess i'm gonna give this a test,
maybe it's better. but i guess this again also mostly depends on the corpus.

-- 
otr fp: https://www.ctrlc.hu/~stef/otr.txt



More information about the cypherpunks mailing list