MIT Department of Electrical Engineering & Computer Science

E E C S

Algorithms for Clustering

Moses Charikar
Stanford University

Wednesday, March 15, 2000
4:15 PM (refreshments 4:00)
Room NE43-518
EECS Special Seminar

Abstract

Clustering problems arise in various contexts including classification, information retrieval and data mining. Roughly speaking, clustering refers to partitioning a set of objects into groups of similar objects. The objects (e.g. documents or images) are usually represented as points in some space with a distance measure and the objective is to obtain clusters of points that are close to each other. Problems of this flavor also occur in discrete location theory, where the goal is to locate a set of facilities (e.g. factories or warehouses) so as to serve a given set of clients. This talk describes several algorithmic aspects of clustering problems.

The first part of the talk will focus on the approximability of several natural clustering objectives. The quality of a clustering is typically measured by an objective function and the goal of an algorithm is to minimize this. For most natural objective functions, the corresponding optimization problem turns out to be NP-hard. In the face of this fundamental intractability, researchers have shifted their focus from exact solutions to obtaining approximate solutions that are guaranteed to be close to the optimal. I will describe approximation algorithms for classical clustering and location problems such as k-median and facility location. Both linear programming based approaches as well as purely combinatorial approaches such as local search can be used to obtain constant factor approximations for these problems. Further, the ideas behind these algorithms extend to other problems such as min-sum clustering.

The second part of the talk will examine other algorithmic issues that arise in clustering. I will introduce clustering problems where outliers must be identified and excluded prior to clustering. I will also formulate problems and describe results related to the incremental maintenance of clusters in a dynamic point set.


URL of this page: http://www-eecs.mit.edu/AY99-00/events/48.html
Created: Mar 1, 2000  | Modified: Mar 1, 2000
This event is from the MIT EECS 1999-00 archive.  | Current events
To MIT EECS home page  | Your comments and inquiries are welcome.