Guest guest Posted May 16, 2003 Report Share Posted May 16, 2003 A New Tool for Translating Ancient, Flowing Script By IAN AUSTEN, NYTimes.com SANSKRIT LIVES: Software provides on-screen transliteration to Roman characters. Sanskrit, in which classical Indian literature was composed, is among the world's oldest recorded languages. But putting works created over the last 3,000 years onto the Web has not been easy. Documents written in Devanagari, the script used for Sanskrit and other South Asian languages, can be scanned as images. But optical character recognition, or O.C.R., software for turning Devanagari texts into digital information that can be searched and reformatted has not been commercially available. That has not been for lack of effort. Because Devanagari is also used for widely spoken contemporary languages like Hindi, several research teams based in India are working on O.C.R. technology to capture it. But Venugodal Govindaraju, the associate director of the Center of Excellence in Document Analysis and Recognition, or Cedar, at the State University of New York at Buffalo, suggested that a lack of collaboration may have limited their efforts. "They report their research in journals and at conferences," he said, "but they don't make the data sets they develop available to other institutions." In an effort to accelerate the development of O.C.R. software for Devanagari (a compound word whose literal translation is "city of immortals"), Cedar and the Indian Statistical Institute are distributing a script-recognition tool that they hope will become the international standard for software that can recognize Devanagari. Their script-recognition software, which can be downloaded free at www .cedar.buffalo.edu/ILT, can separate lines and individual characters written in the flowing script. It then offers an on-screen transliteration in Roman characters for proofreading. Dr. Govindaraju said that in the early 1990's, Cedar gave away similar tools it had created for the United States Postal Service to analyze handwriting. "That spurred work in the Roman alphabet on handwriting recognition," he said. "There has been tremendous progress since then." Those earlier tools allowed Cedar to develop the first successful Roman alphabet handwriting recognition system capable of operating on a mass scale. Dr. Govindaraju said the tools became the standard for comparing results among all handwriting recognition groups. Work by those groups also gave rise to handwriting recognition programs like Graffiti 2, for users of some newer hand-held computers, and Microsoft Windows Journal, for users of the Windows XP tablet PC's. Like the Roman alphabet and some Japanese and Chinese characters, Devanagari could eventually be embraced by makers of commercial recognition software. Working from a common base for Devanagari would make it easier for researchers to exchange data from their efforts. Most of the work being done focuses on refining the recognition software to cope with variations, a task that is likely to be more complex with Devanagari script than the Roman alphabet. The basic Devanagari symbol set has 34 consonants and 18 vowels. But those symbols can be combined to create 600 to 700 different characters, Dr. Govindaraju said. With optical character recognition technology, Sanskrit documents could be transformed into digital text that could be viewed on computer monitors using existing Devanagari screen fonts. Dr. Govindaraju's efforts to keep Sanskrit alive are not limited to computers. He is part of a group of volunteers who spend time on weekends teaching children how to read the language. "Sanskrit is a dying language," he said, "but I love Sanskrit." Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You are posting as a guest. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.