In datacenters, it is crucial that servers respond to requests quickly because many latency-sensitive applications require that the vast majority of requests complete within tens of microseconds. Unfortunately, state-of-the-art approaches for achieving low latency either leave CPU resources sitting idle or dedicate cores to spin-polling the network card, thereby achieving poor CPU efficiency. As Moore's Law slows and network speeds continue to rise, datacenter applications are increasingly bottlenecked on the CPU, increasing the importance of CPU efficiency.
This talk will describe two systems that enable applications to achieve low tail latency and high CPU efficiency simultaneously. The first system, Shenango, targets CPU scheduling and enables high efficiency by reallocating cores across applications at microsecond scale as workloads vary over time. Shenango achieves such fast reallocation rates with three key ideas: an efficient algorithm that detects when applications would benefit from more cores, a centralized component called the IOKernel that orchestrates core reallocations, and an approach to flow steering that adapts quickly to changes in core allocations. Shenango achieves tail latencies comparable to those of existing kernel bypass approaches such as ZygOS, while linearly trading latency-sensitive application throughput for batch application throughput as load varies over time, vastly increasing CPU efficiency. The second system, Chimera, targets the network. It proposes to co-design congestion control with CPU scheduling by extending Shenango, enabling congestion control protocols to explicitly optimize for CPU efficiency.
Thesis Supervisor(s): Hari Balakrishnan and Adam Belay