E E C S  MIT Electrical Engineering and Computer Science

EECS Event

Abstractions for Fault Tolerant Distributed Computing

Idit Keidar
MIT, LCS

Monday, April 23, 2001
12:00 PM (refreshments 11:45)
Room NE43-518
EECS Special Seminar

Abstract

Building fault-tolerant distributed systems over unreliable network technologies is extremely hard. My primary research goal is to make fault-tolerant distributed systems easier to develop and to understand. This goal can be met via the provision of easy-to-use generic services that exhibit good performance and have clear interfaces and well-specified semantics. Such generic services should provide good abstractions, that is, capture aspects that are common to many systems. At the same time, they should provide sufficient flexibility for a variety of applications. In order to build useful generic services, my research approach combines a variety of techniques: formal modeling, specification, and verification, as well as implementation, performance tuning, and application-driven study.

In this talk, I will discuss some of my work on group communication, which is group-based infrastructure that supports applications that require both consistency and performance, e.g., collaborative computing. Group communication has emerged as a central component of modern cluster computing environments. As demands on clusters continue to increase, we are seeing the emergence of wide-area computing systems in which a set of clusters are jointly administered. Existing group communication solutions are inadequate for such purposes since they do not perform will in the wide-area and have limited scalability.

In this talk, I will present a new architecture for scalable group communication services in WANs, and a new group membership algorithm, called Moshe, implemented within it. In contrast to previous group membership algorithms, Moshe typically executes in a single communication round. I will present experimental findings of running Moshe over the Internet. Moshe is but one of a set of services that are needed to simplify the development of large-scale reliable networking applications. I will conclude the talk by describing emerging research directions that focus on new services and additional aspects of existing services.


Related pages: 2000-01 events   |   Current events   |   2000-01 Web archives
This page:
http://www-eecs.mit.edu/AY00-01/events/68.html
Created: Apr 10, 2001  |  Modified: Apr 10, 2001
Site table of contents  |  Site map  |  Search  |  Your comments and inquiries are welcome.