DOSIR

Thu Nov 26 01:22:36 PST 2015

On Wed, Nov 25, 2015 at 8:31 PM, grarpamp <grarpamp at gmail.com> wrote:
>...
> Post links to your datasets so others can preserve, share, analyze,
> journalize / blog / report / publish about, etc.

http://www.wired.com/2015/11/google-open-sourcing-tensorflow-shows-ais-future-is-data-not-code/

Google Open-Sourcing TensorFlow Shows AI’s Future Is Data

When Google open sourced its artificial intelligence engine last
week—freely sharing the code with the world at large—Lukas Biewald
didn’t see it as a triumph of the free software movement. He saw it as
a triumph of data.

That’s how you’d expect him to see it. He’s the CEO of the San
Francisco startup CrowdFlower, which helps online companies like
Twitter juggle massive amounts of data. But after spending time at the
Stanford AI Lab, he knows artificial intelligence. And his point is a
good one.

'What they're not opening up is their data. They would never do that.'
Lukas Biewald, CrowdFlower

In open sourcing the TensorFlow AI engine, Biewald says, Google showed
that, when it comes to AI, the real value lies not so much in the
software or the algorithms as in the data needed to make it all
smarter. Google is giving away the other stuff, but keeping the data.

“As companies become more data-driven, they feel more comfortable open
sourcing lots of [software]. They know they’re sitting on lots of
proprietary data that nobody else has access to,” says Biewald, who
also worked at Yahoo as a search engineer and helped bootstrap a
notable search startup called Powerset, now owned by Microsoft. “What
they’re not opening up is their data. They would never do that.”

Making Machines Smarter

Biewald compares this to IBM’s recent purchase of The Weather Channel,
where Big Blue paid millions largely to acquire data it could use to
feed its AI ambitions. “It’s interesting that while companies are
buying data, they’re open-sourcing their algorithms,” he says. “It’s
pretty clear where these companies’ bets are, in terms of what matters
for machine learning.”

TensorFlow, you see, deals in a form of AI called deep learning. With
deep learning, you teach systems to perform tasks such as recognizing
images, identifying spoken words, and even understanding natural
language by feeding data into vast neural networks connected machines
that approximate the web of neurons within the human brain. If you
feed photos of cats into a neural net, you can teach it to recognize
cats. If feed it conversational data, you can teach it to carry on
conversations.

The algorithms that drive these neural networks aren’t new. They date
to the 1980s. What’s new is that, thanks to the Internet, their
creators have the processing power and the enormous amounts of data to
make these algorithms viable. To teach a system to recognize a cat,
you need an awful lot of machines and an awful lot of cat photos.

After the rise of cloud computing, in which companies like Amazon and
Microsoft rent access to the vast processing power of the net, we all
have access to a vast arrays of machines. But the richest data sits
inside massive companies like Google and Facebook. Billions of people
use their services, which trade in a rich trove of information, from
text to photos to videos to speech and beyond. Both companies are hard
at work building powerful AI software. But their real competitive edge
comes from having a vast quantity of high quality data they can use to
teach this software to “think” more like a human.

To be sure, Biewald is exaggerating (at least a bit) to make a point.
Though Google has open sourced some very important piece of its AI
engine, it’s keeping other pieces to itself (at least for now). What
also matters in the competitive space is talent. Though the algorithms
that drive this technology are an old thing, they evolve at rapid
pace, moving into more and more areas, and this evolution is driven by
some very smart people.

That’s one of the reasons Google open sourced TensorFlow. If people
beyond the company can use its software, Google can more easily bring
talent and ideas into the company—and its software. It also can
continue to work with people who have left the company. “We have a lot
of summer interns coming in and they do a lot of interesting research
while they are here at Google,” says Jeff Dean, one of the Google
engineers at the heart of the company’s AI work. “For some kinds of
problems, they can basically just take their work and continue
developing it on the open source release of TensorFlow.”

''It's kinda hard for academics and startups to do really meaningful
machine learning work because they don't have access to the same kind
of datasets that a Google or an Apple would have.' Lukas Biewald

But there’s another reason Google can attract the top deep learning
researchers: its data. The same goes for Facebook and other Internet
giants. In recent years, many of the field’s top researchers already
have joined these companies, including University of Toronto professor
Geoff Hinton (now at Google), New York University professor Yann Lecun
(now at Facebook), and Stanford professor Andrew Ng (now at Chinese
search giant Baidu).

As Biedwald points out, you can’t necessarily get access to the same
data if you’re an academic. “It’s kinda hard for academics and
startups to do really meaningful machine learning work,” he says,
“because they don’t have access to the same kind of datasets that a
Google or an Apple would have.”

Yes, Apple generates lots of data too, through services like Siri. But
some feel Apple could be at a disadvantage because, after taking a
more extreme stance on privacy than Google and Facebook, it more
tightly restricts how its engineers can makes use of the data they do
have. That’s how important digital information is to this movement.
Ken Forbus, a professor of computer science at Northwestern University
who specializes in AI, believes Apple may have to rely more heavily on
technologies beyond of the deep learning realm because of its stance
on privacy.

There are many ways Apple can work around this, including changing its
privacy policies. Like Google and others, it has acquired its own deep
learning startups, and it has attracted AI talent in other ways. But
one thing is indisputable: The future of AI can’t happen without the
data.