One of the first examples of those principles in action is a collaboration between Eduard Hovy of USC's Information Sciences Institute (ISI) and Professor Avigdor Gal of the Technion, Israel's Institute of Technology. They will be providing part of the technological infrastructure for an EU project called "QUALEG," an acronym for "Quality of Service and Legitimacy in eGovernment," a multi-institution initiative to improve the delivery of government services.
QUALEG is a classic example of the challenges Europe faces as it creates a multi-national federated system. The countries directly involved are Poland, France and Germany. That means integrating documents written in at least three languages, most likely four, since many European documents also have English versions. It means combining different legacy software as well. It is a where-do-you-start nightmare, as tricky as trying to re-fold a map in a convertible on the Autobahn.
One of the first steps in the process is the creation of an ontology, a cross-lingual thesaurus that provides a thematic structure for all the terms one is likely to encounter. Ontologies need to be semantically sensitive: "bank" near "river" means something quite different from "bank" near "teller."
"In the context of Digital Government, ontologies play an increasingly important role, as database metadata schemas, terminology standardization structures and the foundation for interfaces between applications," says Hovy. "Yet the complexity and cost of building ontologies remains a daunting challenge."
For QUALEG, Hovy's group will provide a "starter" ontology that a machine can learn from. The ISI portion will thus become the essential backend piece underlying the integration of the QUALEG databases. Gal's group is creating a software they've dubbed "OntoBuilder." Essentially, it is a heuristic front-end. Through a simple interface, users can input terms that will help a topic-specific ontology learn and become more accurate.
OntoBuilder will be used by city administrators in three cities in Poland, Germany, and France to create their local ontologies. The automated ISI software should help boost their productivity, Hovy says.
In this case, since this project involves at least four languages with some having declensional endings and gender-based forms, creating an ontology would seem not only daunting, but near impossible. Yet with his group's experience in machine translation, Hovy says this is one challenge that is under control, "There are superficial differences in looking at word forms, so you need software that gets to the root forms of each word. In machine translation, it's mostly a solved problem."
But there is an additional hurdle, says Hovy, "You have overlapping meanings - this word may have four meanings in this language and seven meanings in that one." However, Hovy says, results improve markedly if you're working in a particular domain. If you know that all your terms are drawn from finance, the software will more easily translate "bank" as a financial institution and not a geographic entity.
The researchers will use a "clustering" approach. In clustering, the machine is taught to understand relationships by word occurrence: glass, paperweight, perfume bottle versus glass, windshield, rearview mirror. One starts with "topic signatures" - a set of words for each category that are weighted by relevance to that topic. Using this system, accuracy can go as high as 75%, depending on how clear and distinct the topics are, says Hovy.
The process still requires some human intervention, at least in its initial stages. "Clustering has been used since the 60's," says Hovy, "But it's never been very accurate by itself. If you give it additional help, it can be."
QUALEG at this phase is a pilot program that it is hoped can be extended throughout Europe. But Hovy's and Gal's work is equally ambitious - it could lay the groundwork for an "ontology service bureau," where those charged with database creation could have that initial painstaking step performed. If not fully automated, such a system could at least eliminate much of the individual time and effort that goes into ontology creation.
|This site is maintained by the Digital Government Research Center at the University of Southern California's Information Sciences Institute.|| CONTACT POLICIES|