Doctoral Thesis: Compositional Analytical-ML Fusion: A New Paradigm for Modeling Computer Systems

Thursday, February 12
4:00 pm - 5:00 pm

Kiva (32-G449, Stata Center)

By: Arash Nasr-Esphahany

Supervisor: Professor Mohammed Alizedeh

Details

  • Date: Thursday, February 12
  • Time: 4:00 pm - 5:00 pm
  • Location: Kiva (32-G449, Stata Center)
Additional Location Details:

Performance modeling—predicting how systems behave before they are built—underpins computer system design. Yet the field faces a fundamental tradeoff between abstraction levels: low-level simulators (cycle/packet-level) offer fidelity but are prohibitively slow and expensive to maintain; analytical models are fast but too approximate for quantitative prediction. As systems grow more complex and AI-driven design demands millions of evaluations, this gap becomes a bottleneck.

This thesis introduces Compositional Analytical-ML Fusion, a paradigm that achieves the fidelity of low-level simulation at the speed of analytical models. The key insight is a division of labor: encode domain knowledge (the mental models designers already possess) as simple analytical models capturing first-order effects; let machine learning learn the remaining dynamics from data.

Validation spans the systems stack. When all relevant state is observed, standard supervised learning suffices. For CPU microarchitecture, Concorde combines per-component analytical throughput bounds with a lightweight neural predictor, achieving a 100,000x speedup over cycle-level simulation with 2–3% error. For data center networks, m3 predicts flow completion times on 6,000 hosts in 40 seconds versus 12 hours for packet simulation, with single-digit error.

When latent variables confound predictions, standard methods fail: in video streaming, expert-designed simulation predicted a tuned algorithm would perform worse than baseline; deployment showed it performed 2.6x better. CausalSim addresses this by encoding causal structure and exploiting randomized experiments; the theoretical foundations (Bijective Generation Mechanisms) establish which data collection strategies unlock which counterfactual questions.

These contributions demonstrate that performance modeling need not choose between fidelity and speed: the paradigm adapts to what can be observed—encoding mechanistic structure when state is visible, causal structure when latent variables matter—with ML calibrating the rest.