Naiad: a system for incremental, iterative and interactive parallel computation
We are developing a new system for large-scale data analysis -- called "Naiad" -- which has the goal of supporting complex iterative queries over dynamic inputs at interactive timescales. Like many existing systems, Naiad supports high-level declarative queries, data-parallel execution, and transparent distribution. Unlike these systems, Naiad can efficiently execute queries with multiple (possibly nested) iterative loops, while simultaneously supporting low-latency incremental changes to the query inputs. To achieve this, Naiad generalizes traditional incremental dataflow to admit collections that vary in multiple independent dimensions, each corresponding to a distinct "reason" for which the collection may have changed. This flexibility allows far greater re-use of previous work when collections may change for multiple reasons, such as external stimuli and internal feedback.
This is a talk in three parts. First, I will introduce "differential dataflow", which is the new computational framework that enables Naiad to compute iterations and incremental updates efficiently. I will go on to discuss how we have implemented Naiad as a decentralized distributed system, and how this lets the system scale even when the amount of work per increment is small. Finally, I will give a demonstration of how Naiad can be used to perform complex analytics interactively on a real-world social networking dataset.
This is joint work with Frank McSherry, Rebecca Isaacs, Michael Isard and Martìn Abadi.