Ensuring fast and seamless service to users is critical for today's cloud services. However, guaranteeing fast response can be challenging due to random service delays that are common in today's data centers. In this thesis we explore the use of redundancy to combat such service variability.
For example, replicating a computing task at multiple servers and then waiting for the earliest copy saves service time. But redundant tasks can cost more computing resources and delay subsequent tasks. We present a queueing-theoretic framework to answer fundamental questions such as: 1) how many replicas to launch?, 2) which queues to join?, and 3) when to issue and cancel the replicas? We identify surprising regimes where replication reduces both delay as well as resource cost. This task replication idea can also be generalized to analyze latency in content download from erasure coded storage.
Achieving low latency is even more challenging in streaming communication because packets need to be delivered fast and in-order. Another focus of this thesis is to develop erasure codes that ensure smooth playback in streaming.
Thesis supervisors: Prof. Gregory Wornell
Thesis Committee: Prof. Emina Soljanin, Prof. Devavrat Shah