Scientists Work on Software to Scan Arabic

The New York Times

January 27, 2005

Scientists Work on Software to Scan Arabic

BUFFALO, N.Y. (AP) -- Computer scientists are developing software to scan
Arabic documents, including handwritten ones, for specific words and
phrases, filling a void that became apparent following the Sept. 11.

Besides helping with intelligence gathering, the software should expand
access to modern and ancient Arabic manuscripts. It will allow Arabic
writings to be digitized and posted on the Web.

``The whole Internet is skewed toward people who speak English,'' said Venu
Govindaraju, director of the Center for Unified Biometrics and Sensors at
the University at Buffalo, where the software is being developed.

Govindaraju fears that if optical character recognition software isn't
developed for a particular language, ``then all the classic texts in that
language will disappear into oblivion.''

Bill Young, an Arab language specialist at the University of Maryland, said
the software could help scan through masses of typed pages for specific
names or words, though he cautioned that handwritten Arabic presents
serious challenges for computers.

For instance, the word mas'uul, meaning responsible, can be written in more
than one way, he said. So the software would have to be given instructions
about possible variations.

Govindaraju, who helped develop software to recognize handwritten addresses
in English, said the Arabic software would take into account the fact that
characters may take different forms depending on where within a word they
appear, and that Arabic vowels are pronounced but often not written.

