MIT Department of Electrical Engineering & Computer Science

E E C S

Unsupervised Language Acquisition

Carl de Marcken
MIT AI Laboratory

Tuesday, April 16, 1996
4:00 PM (3:45 refreshments)
Room NE43-518
EECS Special Seminar

Abstract

How is human language learned? In particular, how do discrete content-bearing elements like words emerge from highly variable, continuous speech signals? This is a fundamental open question in human and machine intelligence. In this talk I attack it as a problem of structural induction, using unsupervised learning techniques based on compression. The principle technical contribution of this work is a representation and learning algorithm that enable meaningful structure to be extracted from data even when uninteresting regularities abound.

Starting from a minimum description-length modeling framework, I will introduce and motivate a hierarchical coding scheme that represents both signals and model parameters as compositions of model parameters. This coding scheme is extremely well-behaved with respect to search, eliminates competition between interesting and uninteresting patterns in data, and mirrors linguistic mechanisms. Optimizing model parameters over a sequence of text or speech samples produces a statistical language model, a segmentation of the input, and a set of parameters that have natural linguistic interpretations.

The final models fare well on both linguistic and statistical grounds. I will present record text compression rates as well as record recall rates for intuitive segmentation boundaries in English and Chinese text and transcripts of speech to children. I will describe extensions to the basic framework for learning from speech rather than text and present the first dictionaries learned directly from spoken utterances. Finally, time permitting, I will present extensions for learning aspects of grammar and for learning translation models between two languages or one language and representations of meaning.

HOST: Professor Eric Grimson


URL of this page: http://www-eecs.mit.edu/AY95-96/events/48.html
Created: Apr 8, 1996  | Modified: Jun 25, 1997
This announcement is from the MIT EECS 1995-96 archive.  | Current events
To MIT EECS home page  | Your comments and inquiries are welcome.