Tuesday, April 27, 1999
2:00 PM (refreshments 1:45)
Room NE43-518
EECS Special Seminar
Abstract
In this talk, I will discuss how to achieve common-case peak performance for I/O-intensive cluster applications. The motivation for this work is experience with NOW-Sort, a high-performance external sort for clusters of workstations. NOW-Sort attains peak performance only at night -- when machines are otherwise idle, and all potential performance distractions are manually removed from the system. However, when run in a less sterile (and more realistic) environment, performance suffers noticeably.
The main reason for this lack of "performance availability" is the presence of performance anomalies in clustered systems. Due to the complexity of both hardware and software, the behavior of machines across a seemingly homogeneous pool of machines is often quite varied. Software predicated on this homogeneity will exhibit erratic performance, often an order of magnitude worse than expected. To remedy this, systems must assume that such performance variations exist and contain provisions to operate well in spite of them. Towards this end, I will describe a system called River. River employs two key ideas to avoid performance faults: load balancing, via high-performance distributed queues, and replication, with graduated declustering. With these two components, River facilitates the construction of applications that perform gracefully even under severe performance anomalies, allowing data to flow seamlessly around such faults. The result is nearly ideal performance at all times -- whether day or night -- with applications effectively utilizing their share of resources.
HOSTS: Professor B. Lampson and Professor F. Kaashoek
|
Modified: Apr 17, 1999
|
Current events
|
Your comments
and inquiries are welcome.