Naiad: a system for incremental, iterative and interactive parallel computation

SHARE:

Event Speaker: 

Derek Murray, Microsoft research

Event Location: 

G882

Card Description: 

We are developing a new system for large-scale data analysis -- called "Naiad" -- which has the goal of supporting complex iterative queries over dynamic inputs at interactive timescales.

Event Date/Time: 

Thursday, September 6, 2012 - 3:00pm

Research Area: 

Naiad: a system for incremental, iterative and interactive parallel computation

Speaker: Derek Murray, Microsoft research
Date: Thursday, September 6 2012
Time: 3:00PM to 4:00PM
Location: G882
Host: Frans Kaashoek, MIT
Contact: Frans Kaashoek, kaashoek@mit.edu

We are developing a new system for large-scale data analysis -- called "Naiad" -- which has the goal of supporting complex iterative queries over dynamic inputs at interactive timescales. Like many existing systems, Naiad supports high-level declarative queries, data-parallel execution, and transparent distribution. Unlike these systems, Naiad can efficiently execute queries with multiple (possibly nested) iterative loops, while simultaneously supporting low-latency incremental changes to the query inputs. To achieve this, Naiad generalizes traditional incremental dataflow to admit collections that vary in multiple independent dimensions, each corresponding to a distinct "reason" for which the collection may have changed. This flexibility allows far greater re-use of previous work when collections may change for multiple reasons, such as external stimuli and internal feedback. 

This is a talk in three parts. First, I will introduce "differential dataflow", which is the new computational framework that enables Naiad to compute iterations and incremental updates efficiently. I will go on to discuss how we have implemented Naiad as a decentralized distributed system, and how this lets the system scale even when the amount of work per increment is small. Finally, I will give a demonstration of how Naiad can be used to perform complex analytics interactively on a real-world social networking dataset. 

This is joint work with Frank McSherry, Rebecca Isaacs, Michael Isard and Martìn Abadi.