Illuminating Blacked-Out Words

R. A. Hettinga rah at
Wed May 12 11:36:21 PDT 2004


The New York Times

May 10, 2004

Illuminating Blacked-Out Words

European researchers at a security conference in Switzerland last week
demonstrated computer-based techniques that can identify blacked-out words
and phrases in confidential documents.

 The researchers showed their software at the conference, the Eurocrypt, by
analyzing a presidential briefing memorandum released in April to the
commission investigating the Sept. 11 attacks. After analyzing the
document, they said they had high confidence the word "Egyptian" had been
blacked out in a passage describing the source of an intelligence report
stating that Osama Bin Ladin was planning an attack in the United States.

The researchers, David Naccache, the director of an information security
lab for Gemplus S.A., a Luxembourg-based maker of banking and security
cards, and Claire Whelan, a computer science graduate student at Dublin
City University in Ireland, also applied the technique to a confidential
Defense Department memorandum on Iraqi military use of Hughes helicopters.

 They said that although the name of a country had been blacked out in that
memorandum, their software showed that it was highly likely the document
named South Korea as having helped the Iraqis.

 The challenge of identifying blacked-out words came to Mr. Naccache as he
watched television news on Easter weekend, he said in a telephone interview
last Friday.

"The pictures of the blacked-out words appeared on my screen, and it piqued
my interest as a cryptographer," he said. He then discussed possible
solutions to the problem with Ms. Whelan, whom he is supervising as a
graduate adviser, and she quickly designed a series of software programs to
use in analyzing the documents.

Although Mr. Naccache is the director of Gemplus, a large information
security laboratory, he said that the research was done independently from
his work there.

The technique he and Ms. Whelan developed involves first using a program to
realign the document, which had been placed on a copying machine at a
slight angle. They determined that the document had been tilted by about
half a degree.

By realigning the document it was possible to use another program Ms.
Whelan had written to determine that it had been formatted in the Arial
font. Next, they found the number of pixels that had been blacked out in
the sentence: "An Egyptian Islamic Jihad (EIJ) operative told an xxxxxxxx
service at the same time that Bin Ladin was planning to exploit the
operative's access to the US to mount a terrorist strike." They then used a
computer to determine the pixel length of words in the dictionary when
written in the Arial font.

 The program rejected all of the words that were not within three pixels of
the length of the word that was probably under the blackened-out area in
the document.

The software then reduced the number of possible words to just 7 from 1,530
by using semantic guidelines, including the grammatical context. The
researchers selected the word "Egyptian" from the seven possible words,
rejecting "Ukrainian" and "Ugandan," because those countries would be less
likely to have such information.

After the presentation at Eurocrypt, the researchers discussed possible
measures that government agencies could take to make identifying
blacked-out words more difficult, Mr. Naccache said in the phone interview.
One possibility, he said, would be for agencies to use optical character
recognition technology to rescan documents and alter fonts.

 In January, the State Department required that its documents use a more
modern font, Times New Roman, instead of Courier, Mr. Naccache said.
Because Courier is a monospace font, in which all letters are of the same
width, it is harder to decipher with the computer technique. There is no
indication that the State Department knew that.

 Experts on the Freedom of Information Act said they feared the computer
technique might be used as an excuse by government agencies to release even
more restricted versions of documents.

 "They have exposed a technique that may now become less and less useful as
a result," said Steven Aftergood, a senior research analyst at the
Federation of American Scientists, of the research project. "We care
because there are all kinds of things withheld by government agencies

R. A. Hettinga <mailto: rah at>
The Internet Bearer Underwriting Corporation <>
44 Farquhar Street, Boston, MA 02131 USA
"... however it may deserve respect for its usefulness and antiquity,
[predicting the end of the world] has not been found agreeable to
experience." -- Edward Gibbon, 'Decline and Fall of the Roman Empire'

More information about the cypherpunks-legacy mailing list