Parallelism is critical to achieve high performance in modern computer systems. Unfortunately, most programs scale poorly beyond a few cores, and those that scale well often require heroic implementation efforts. This is because current parallel architectures squander most of the parallelism available in applications and are too hard to program.
In this talk I will present Swarm, a new execution model, architecture, and system software that exploits far more parallelism than conventional multicores and is almost as easy to program as a sequential thread. Swarm programs consist of tiny tasks, as small as tens of instructions each. Parallelism is implicit: tasks dynamically create new tasks at runtime. Synchronization is implicit: the programmer specifies a total or partial order on tasks. This eliminates the correctness pitfalls of explicit synchronization (e.g. deadlock and data races). Swarm hardware uncovers parallelism by speculatively running tasks out of order, even thousands of tasks ahead of the earliest active task. Its speculation mechanisms build on decades of prior work, but Swarm is the first parallel architecture to scale to hundreds of cores due to its new programming model, distributed structures, and distributed protocols. Leaning on its support for task order, Swarm incorporates new techniques to reduce data movement, harness nested parallelism, and speculate selectively for improved efficiency.
Swarm achieves efficient near-linear scaling to hundreds of cores on otherwise hard-to-scale irregular applications. These span a broad set of domains, including graph analytics, discrete-event simulation, databases, machine learning, and genomics. Swarm even accelerates applications that are conventionally deemed sequential. It outperforms recent software-only parallel algorithms by one to two orders of magnitude, and sequential implementations by up to 600x at 256 cores.
Mark Jeffrey is a PhD candidate in Electrical Engineering and Computer Science at MIT, where he works with Professor Daniel Sanchez. His work on computer systems scales hard-to-parallelize applications through new programming models and parallel hardware architectures. Mark earned an MASc and a BASc in Engineering Science from the University of Toronto. He has received a Facebook Graduate Fellowship, a NSERC Post-Graduate Scholarship, and an IEEE Micro Top Picks award and honorable mention.
Thesis Supervisor: Prof. Daniel Sanchez