MIT Department of Electrical Engineering & Computer Science

E E C S

EECS
1995 (Spring Semester)
Colloquium Series

SCATTER-GATHER:
A BROWSING PARADIGM FOR INFORMATION RETRIEVAL

David Karger
MIT EECS, LCS

As more information becomes available online, retrieving useful information becomes more important. Most prior work has focused on keyword searches of a database of articles. For example the query "find CORPORATE and MERGER and COMMUNICATIONS and BILLION," might produce useful information for a user interested in corporate mergers in the communications industry of value exceeding one billion dollars.

Like the index in a book, the keyword query is great if you know just what you want and how to describe it. If your need is more vague, you may find the table of contents of the book more useful. Our system, Scatter/Gather, produces an "electronic table of contents" to complement the electronic book-index provided by keyword searches. It allows a user to address vague questions such as "what happened last month?" It lets the user BROWSE a collection and learn enough about it to formulate useful search requests.

In a few seconds our system can classify millions of documents into 10 or so topic-coherent "chapters" and display an understandable one-page table of contents. It is fully automatic, and relies on very straightforward mathematical rules to produce the chapters. The "meaning" a user sees in those chapters arises from the ability of a human being to see patterns that are too complicated for a computer to recognize. The system is a symbiosis in which the user and the computer both do what they are best at.


URL of this page: http://www-eecs.mit.edu/AY94-95/events/s95-21.html
Created: Feb 16, 1995  | Modified: Jun 26, 1997
This announcement is from the MIT EECS 1994-95 archive.  | Current events
To MIT EECS home page  | Your comments and inquiries are welcome.