intelligence agencies tracking metaphors

1 Jun 2011

      http://www.theatlantic.com/technology/archive/2011/05/why-are-spy-researcher...

Why Are Spy Researchers Building a 'Metaphor Program'?

By Alexis Madrigal

May 25 2011, 4:19 PM ET 27

A small research arm of the U.S. government's intelligence establishment
wants to understand how speakers of Farsi, Russian, English, and Spanish see
the world by building software that automatically evaluates their use of
metaphors.

That's right, metaphors, like Shakespeare's famous line, "All the world's a
stage," or more subtly, "The darkness pressed in on all sides." Every speaker
in every language in the world uses them effortlessly, and the Intelligence
Advanced Research Projects Activity wants know how what we say reflects our
worldviews. They call it The Metaphor Program, and it is a unique effort
within the government to probe how a people's language reveals their mindset.

"The Metaphor Program will exploit the fact that metaphors are pervasive in
everyday talk and reveal the underlying beliefs and worldviews of members of
a culture," declared an open solicitation for researchers released last week.
A spokesperson for IARPA declined to comment at the time.

IARPA wants some computer scientists with experience in processing language
in big chunks to come up with methods of pulling out a culture's relationship
with particular concepts."They really are trying to get at what people think
using how they talk," Benjamin Bergen, a cognitive scientist at the
University of California, San Diego, told me. Bergen is one of a dozen or so
lead researchers who are expected to vie for a research grant that could be
worth tens of millions of dollars over five years, if the team scan show
progress towards automatically tagging and processing metaphors across
languages.

"IARPA grants are big," said Jennifer Carter of Applied Research Associates,
a 1,600-strong research company that may throw its hat in the Metaphor ring
after winning a lead research spot in a separate IARPA solicitation. While no
one knows the precise value of the rewards of the IARPA grants and the
contracts are believed to vary widely, they tend to support several large
teams of multidisciplinary researchers, Carter said. The awards, which would
initially go to several teams, could range into the five digits annually.
"Generally what happens... there will be a 'downselect' each year, so maybe
only one team will get money for the whole program," she said.*

All this to say: The Metaphor Program may represent a nine-figure investment
by the government in understanding how people use language. But that's
because metaphor studies aren't light or frilly and IARPA isn't afraid of
taking on unusual sounding projects if they think they might help
intelligence analysts sort through and decode the tremendous amounts of data
pouring into their minds.

In a presentation to prospective research "performers," as they're known, The
Metaphor Program's manager, Heather McCallum-Bayliss gave the following
example of the power of metaphors in political discussions. Her slide reads:

    Metaphors shape how people think about complex topics and can influence
beliefs. A study presented participants with a report on crime in a city;
they were asked how crime should be addressed in the city. The report
contained statistics, including crime and murder rates, as well as one of two
metaphors, CRIME AS A WILD BEAST or CRIME AS A VIRUS. The participants were
influenced by the embedded metaphor...

McCallum-Bayliss appears to be referring to a 2011 paper published in the
PLoS ONE, "Metaphors We Think With: The Role of Metaphor in Reasoning," lead
authored by Stanford's Paul Thibodeau. In that case, if people were given the
crime-as-a-virus framing, they were more likely to suggest social reform and
less likely to suggest more law enforcement or harsher punishments for
criminals. The differences generated by the metaphor alternatives were "were
larger than those that exist between Democrats and Republicans, or between
men and women," the study authors noted.

Every writer (and reader) knows that there are clues to how people think and
ways to influence each other through our use of words. Metaphor researchers,
of whom there are a surprising number and variety, have formalized many of
these intuitions into whole branches of cognitive linguistics using studies
like the one outlined above (more on that later). But what IARPA's project
calls for is the deployment of spy resources against an entire language.
Where you or I might parse a sentence, this project wants to parse, say, all
the pages in Farsi on the Internet looking for hidden levers into the
consciousness of a people.

"The study of language offers a strategic opportunity for improved
counterterrorist intelligence, in that it enables the possibility of
understanding of the Other's perceptions and motivations, be he friend or
foe," the two authors of Computational Methods for Counterterrorism wrote.
"As we have seen, linguistic expressions have levels of meaning beyond the
literal, which it is critical to address. This is true especially when
dealing with texts from a high-context traditionalist culture such as those
of Islamic terrorists and insurgents."

In the first phase of the IARPA program, the researchers would simply try to
map from the metaphors a language used to the general affect associated with
a concept like "journey" or "struggle." These metaphors would then be stored
in the metaphor repository. In a later stage, the Metaphor Program scientists
will be expected to help answer questions like, "What are the perspectives of
Pakistan and India with respect to Kashmir?" by using their metaphorical
probes into the cultures. Perhaps, a slide from IARPA suggests, metaphors can
tell us something about the way Indians and Pakistanis view the role of
Britain or the concept of the "nation" or "government."

The assumption is that common turns of phrase, dissected and reassembled
through cognitive linguistics, could say something about the views of those
citizens that they might not be able to say themselves. The language of a
culture as reflected in a bunch of text on the Internet might hide secrets
about the way people think that are so valuable that spies are willing to pay
for them.

More Than Words IARPA is modeled on the famed DARPA -- progenitors of the
Internet among other wonders -- and tasked with doing high-risk, high-reward
research for the many agencies, the NSA and CIA among them, that make up the
American intelligence-gathering force. IARPA is, as you might expect, a
low-profile organization. Little information is available from the
organization aside from a couple of interviews that its administrator, Lisa
Porter, a former NASA official, gave back in 2008 to Wired and IEEE Spectrum.
Neither publication can avoid joking that the agency is like James Bond's
famous research crew, but it turns out that the place is more likely to use
"cloak-and-dagger" in a sentence than in actual combat with supervillainy.

A major component of the agency's work is data mining and analysis. IARPA is
split into three program offices with distinct goals: Smart Collection "to
dramatically improve the value of collected data from all sources"; Incisive
Analysis "to maximize insight from the information we collect, in a timely
fashion"; and Safe & Secure Operations "to counter new capabilities
implemented by our adversaries that would threaten our ability to operate
freely and effectively in a networked world." The Metaphor Program falls
under the office of Incisive Analysis and is headed by the aforementioned
McCallum-Bayliss, a former technologist at Lockheed Martin and IBM, who
co-filed several patents relating to the processing of names in databases.

Incisive Analysis has put out several calls for other projects. They range
widely in scope and domain. The Babel Program seeks to "demonstrate the
ability to generate a speech transcription system for any new language within
one week to support keyword search performance for effective triage of
massive amounts of speech recorded in challenging real-world situations."
ALADDIN aims to create software to automatically monitor massive amounts of
video. The FUSE Program is trying to "develop automated methods that aid in
the systematic, continuous, and comprehensive assessment of technical
emergence" using the scientific and patent literature.

All three projects are technologically exciting, but none of those projects
has the poetic ring nor the smell of humanity of The Metaphor Program. The
Metaphor Program wants to understand what human beings mean through the
unvoiced emotional inflection of our words. That's normally the work of an
examined life, not a piece of spy software.

There is some precedent for the work. It comes from two directions: cognitive
linguistics and natural language processing. On the cognitive linguistic
side, George Lakoff and Mark Johnson of the University of California,
Berkeley did the foundational work, notably in their 1980 book, Metaphors We
Live By. As summarized recently by Zoltan Kvvecses in his book, Metaphor: A
Practical Introduction, Lakoff and Johnson showed that metaphors weren't just
the devices of writers but rather "a valuable cognitive tool without which
neither poets nor you and I as ordinary people could live."

In this school of cognitive linguistics, we need to use more embodied,
concrete domains in order to describe more abstract ones. Researchers
assembled the linguistic expressions we use like "That class gave me food for
thought" and "His idea was half-baked" into a construct called a "conceptual
category." These come in the form of awesomely simple sentences like "Ideas
Are Food." And there are whole great lists of them. (My favorites: Darkness
Is a Solid; Time Is Something Moving Toward You; Happiness Is Fluid In a
Container; Control Is Up.) The conceptual categories show that humans use one
domain ("the source") to describe another ("the target"). So, take Ideas Are
Food: thinking is preparing food and understanding is digestion and believing
is swallowing and learning is eating and communicating is feeding. Put
simply: We import the logic of the source domain into the target domain.

Below, you can check out how one, "Ideas Are Food," is expressed, or skip
past the gallery to the rest of the story.

Ideas are Food 1

Full Screen

The main point here is that metaphors, in this sense, aren't soft or literary
in any narrow sense. Rather, they are a deep and fundamental way that humans
make sense of the world. And unfortunately for spies who want to filter the
Internet to look for dangerous people, computers can't make much sense out of
sentences like, "We can make beautiful music together," which Google
translates as something about actually playing music when, of course, it
really means, "We can be good together." (Or as the conceptual category would
phrase it: "Interpersonal Harmony Is Musical Harmony.")

While some of the underlying structures of the metaphors -- the conceptual
categories -- are near universal (e.g. Happy Is Up), there are many
variations in their range, elaboration, and emphasis. And, of course, not
every category is universal. For example, Kvvecses points to a special
conceptual category in Japanese centered around the hara, or belly, "Anger Is
(In The) Hara." In Zulu, one finds an important category, "Anger Is
(Understood As Being) In the Heart," which would be rare in English.
Alternatively, while many cultures conceive of anger as a hot fluid in a
container, it's in English that we "blow off steam," a turn of phrase that
wouldn't make sense in Zulu.

These relationships have been painstakingly mapped by human analysts over the
last 30 years and they represent a deep culturolinguistic knowledge base. For
the cognitive linguistic school, all of these uses of language reveal
something about the way the people of a culture understand each other and the
world. And that's really the target of the metaphor program, and what makes
it unprecedented. They're after a deeper understanding of the way people use
words because the deep patterns encoded in language may help intelligence
analysts understand the people, not just the texts.

For Lakoff, it's about time that the government started taking metaphor
seriously. "There have been 30 years of neglect of current linguistics in all
government-sponsored research," he told me. "And finally there is somebody in
the government who has managed to do something after many years of trying."

UC San Diego's Bergen agreed. "It's a totally unique project," he said. "I've
never seen anything like it."

But that doesn't mean it's going to be easy to create a system that can
automatically deduce what Americans' biases about education from a statement
like "The teacher spoon-fed the students."

Lakoff contends that it will take a long, sustained effort by IARPA (or
anyone else) to complete the task. "The quick-and-dirty way" won't work, he
said. "Are they going to do a serious scientific account?"

Building a Metaphor Machine

The metaphor problem is particularly difficult because we don't even know
what the right answers to our queries are, Bergen said. 

"If you think about other sorts of automation of language processing, there
are right answers," he said. "In speech recognition, you know what the word
should be. So you can do statistical learning. You use humans, tag up a
corpus and then run some machine learning algorithms on that. Unfortunately,
here, we don't know what the right answers are."

For one, we don't really have a stable way of telling what is and what is not
metaphorical language. And metaphorical language is changing all the time.
Parsing text for metaphors is tough work for humans and we're made for it.
The kind of intensive linguistic analysis that's made Lakoff and his students
(of whom Bergen was one) famous can take a human two hours for every 500
words on the page.

But it's that very difficulty that makes people want to deploy computing
resources instead of human beings. And they do have some directions that they
could take. James Martin of the University of Colorado played a key role in
the late 1980s and early 1990s in defining the problem and suggesting a
solution. Martin contended "the interpretation of novel metaphors can be
accomplished through the systematic extension, elaboration, and combination
of knowledge about already well-understood metaphors," in a 1988 paper.

What that means is that within a given domain -- say, "the family" in Arabic
-- you can start to process text around that. First you'll have humans go in
and tag up the data, finding the metaphors. Then, you'd use what they learned
about the target domain "family" to look for metaphorical words that are
often associated with it. Then, you run permutations on those words from the
source domain to find other metaphors you might not have before. Eventually
you build up a repository of metaphors in Arabic around the domain of family.

Of course, that's not exactly what IARPA's looking for, but it's where the
research teams will be starting. To get better results, they will have to
start to learn a lot more about the relationships between the words in the
metaphors. For Lakoff, that means understanding the frames and logics that
inform metaphors and structure our thinking as we use them. For Bergen, it
means refining the rules by which software can process language. There are
three levels of analysis that would then be combined. First, you could know
something about the metaphorical bias of an individual word. Crossroads, for
example, is generally used in metaphorical terms. Second, words in close
proximity might generate a bias, too. "Knockout in the same clause as 'she'
has a much higher probability of being metaphorical if it's in close
proximity to 'he,'" Bergen offered as an example. Third, for certain topics,
certain words become more active for metaphorical usage. The economy's
movement, for example, probably maps to a source domain of motion through
space. So, accelerate to describe something about the economy is probably
metaphorical. Create a statistical model to combine the outputs of those
three processes and you've got a brute-force method for identifying metaphors
in a text.

In this particular competition, there will be more nuanced approaches based
on parsing the more general relationships between words in text: sorting out
which are nouns and how they connect to verbs, etc. "If you have that
information, then you can find parts of sentences that don't look like they
should be there," Bergen explained. A classic kind of identifier would be a
type mismatch. "If I am the verb 'smile,' I like to have a subject that has a
face," he said. If something without a face is smiling, it might be an
indication that some kind of figurative language is being employed.
...
From these constituent parts -- and whatever other wild stuff people cook up
--  the teams will try to build a metaphor machine that can convert a
language into underlying truths about a culture. Feed text in one end and
wait on the other end of the Rube Goldberg software for a series of beliefs
about family or America or power.
We might never be able to build such a thing. Indeed, I get the feeling that
we can't, at least not yet. But what if we can?

"Are they going to use it wisely?" Lakoff posed. "Because using it to detect
terrorists is not a bad idea, but then the question is: Are they going to use
it to spy on us?"

I don't know, but I know that as an American I think through these metaphors:
Problem Is a Target; Society Is a Body; Control Is Up.

* This section of the story was updated to more accurately reflect the intent
of Carter's statement.

Eugen Leitl

tags

participants (1)