<http://www.nytimes.com/2004/05/10/technology/10crypto.html?pagewanted=print&position=> The New York Times May 10, 2004 Illuminating Blacked-Out Words By JOHN MARKOFF European researchers at a security conference in Switzerland last week demonstrated computer-based techniques that can identify blacked-out words and phrases in confidential documents. The researchers showed their software at the conference, the Eurocrypt, by analyzing a presidential briefing memorandum released in April to the commission investigating the Sept. 11 attacks. After analyzing the document, they said they had high confidence the word "Egyptian" had been blacked out in a passage describing the source of an intelligence report stating that Osama Bin Ladin was planning an attack in the United States. The researchers, David Naccache, the director of an information security lab for Gemplus S.A., a Luxembourg-based maker of banking and security cards, and Claire Whelan, a computer science graduate student at Dublin City University in Ireland, also applied the technique to a confidential Defense Department memorandum on Iraqi military use of Hughes helicopters. They said that although the name of a country had been blacked out in that memorandum, their software showed that it was highly likely the document named South Korea as having helped the Iraqis. The challenge of identifying blacked-out words came to Mr. Naccache as he watched television news on Easter weekend, he said in a telephone interview last Friday. "The pictures of the blacked-out words appeared on my screen, and it piqued my interest as a cryptographer," he said. He then discussed possible solutions to the problem with Ms. Whelan, whom he is supervising as a graduate adviser, and she quickly designed a series of software programs to use in analyzing the documents. Although Mr. Naccache is the director of Gemplus, a large information security laboratory, he said that the research was done independently from his work there. The technique he and Ms. Whelan developed involves first using a program to realign the document, which had been placed on a copying machine at a slight angle. They determined that the document had been tilted by about half a degree. By realigning the document it was possible to use another program Ms. Whelan had written to determine that it had been formatted in the Arial font. Next, they found the number of pixels that had been blacked out in the sentence: "An Egyptian Islamic Jihad (EIJ) operative told an xxxxxxxx service at the same time that Bin Ladin was planning to exploit the operative's access to the US to mount a terrorist strike." They then used a computer to determine the pixel length of words in the dictionary when written in the Arial font. The program rejected all of the words that were not within three pixels of the length of the word that was probably under the blackened-out area in the document. The software then reduced the number of possible words to just 7 from 1,530 by using semantic guidelines, including the grammatical context. The researchers selected the word "Egyptian" from the seven possible words, rejecting "Ukrainian" and "Ugandan," because those countries would be less likely to have such information. After the presentation at Eurocrypt, the researchers discussed possible measures that government agencies could take to make identifying blacked-out words more difficult, Mr. Naccache said in the phone interview. One possibility, he said, would be for agencies to use optical character recognition technology to rescan documents and alter fonts. In January, the State Department required that its documents use a more modern font, Times New Roman, instead of Courier, Mr. Naccache said. Because Courier is a monospace font, in which all letters are of the same width, it is harder to decipher with the computer technique. There is no indication that the State Department knew that. Experts on the Freedom of Information Act said they feared the computer technique might be used as an excuse by government agencies to release even more restricted versions of documents. "They have exposed a technique that may now become less and less useful as a result," said Steven Aftergood, a senior research analyst at the Federation of American Scientists, of the research project. "We care because there are all kinds of things withheld by government agencies improperly." -- ----------------- R. A. Hettinga <mailto: rah@ibuc.com> The Internet Bearer Underwriting Corporation <http://www.ibuc.com/> 44 Farquhar Street, Boston, MA 02131 USA "... however it may deserve respect for its usefulness and antiquity, [predicting the end of the world] has not been found agreeable to experience." -- Edward Gibbon, 'Decline and Fall of the Roman Empire'