Thursday, April 30, 1998
3:00 PM (refreshments 2:45)
Room NE43-941
EECS Special Seminar
Abstract
The vast amount of information now available in electronic form has led to increasing demand for applications that process natural language. (Example applications include machine translation, summarization and information extraction). Accurate methods for parsing unrestricted text will almost certainly be a key component in these applications.
Unfortunately, the traditional approach to syntactic analysis -- writing a grammar by hand -- has encountered two major problems. First, ambiguity: even moderate-length sentences often receive thousands of analyses, with no indication of which is correct. Second, coverage: constructing an exhaustive grammar of English has proved to be extremely difficult owing to the huge number of rules needed.
In this talk I will describe my work on machine learning methods for parsing. A statistical model is trained from a corpus of sentences that have been annotated for syntactic structure. Competing analyses for a test data sentence can then be ranked by their probability under the model; moreover the most probable analysis can be efficiently found. I will show how careful design of the model can lead to linguistically motivated parameters, and crucially to parameters that condition heavily on lexical information. The resulting models recover constituents in Wall Street Journal text with 88% accuracy, the best published results on this task. I will discuss information extraction, machine translation and speech recognition as possible applications of the parser.
|
Modified: Apr 23, 1998
|
Current events
|
Your comments
and inquiries are welcome.