Doctoral Thesis: Compositional Analytical-ML Fusion: A New Paradigm for Modeling Computer Systems
Kiva (32-G449, Stata Center)
By: Arash Nasr-Esphahany
Supervisor: Professor Mohammed Alizedeh
Details
- Date: Thursday, February 12
- Time: 4:00 pm - 5:00 pm
- Location: Kiva (32-G449, Stata Center)
Additional Location Details:
Performance modeling—predicting how systems behave before they are built—underpins computer system design. Yet the field faces a fundamental tradeoff between abstraction levels: low-level simulators (cycle/packet-level) offer fidelity but are prohibitively slow and expensive to maintain; analytical models are fast but too approximate for quantitative prediction. As systems grow more complex and AI-driven design demands millions of evaluations, this gap becomes a bottleneck.
This thesis introduces Compositional Analytical-ML Fusion, a paradigm that achieves the fidelity of low-level simulation at the speed of analytical models. The key insight is a division of labor: encode domain knowledge (the mental models designers already possess) as simple analytical models capturing first-order effects; let machine learning learn the remaining dynamics from data.
Validation spans the systems stack. When all relevant state is observed, standard supervised learning suffices. For CPU microarchitecture, Concorde combines per-component analytical throughput bounds with a lightweight neural predictor, achieving a 100,000x speedup over cycle-level simulation with 2–3% error. For data center networks, m3 predicts flow completion times on 6,000 hosts in 40 seconds versus 12 hours for packet simulation, with single-digit error.
When latent variables confound predictions, standard methods fail: in video streaming, expert-designed simulation predicted a tuned algorithm would perform worse than baseline; deployment showed it performed 2.6x better. CausalSim addresses this by encoding causal structure and exploiting randomized experiments; the theoretical foundations (Bijective Generation Mechanisms) establish which data collection strategies unlock which counterfactual questions.
These contributions demonstrate that performance modeling need not choose between fidelity and speed: the paradigm adapts to what can be observed—encoding mechanistic structure when state is visible, causal structure when latent variables matter—with ML calibrating the rest.