Doctoral Thesis: Bridging Theory and Practice in Parallel Clustering

Friday, April 21
10:00 am - 12:00 pm


Jessica Shi


Large-scale graph processing is a fundamental tool in modern data mining, yet poses a major computational challenge as graph sizes increase. In particular, graph clustering, or community detection, is an important problem in graph processing with wide-ranging applications spanning social network analysis, recommendation and search systems, metabolic pathway analysis, and machine learning pipelines. At its core, identifying the underlying substructures of a graph can indicate essential functional groups, such as people with similar interests, news articles on similar topics, or proteins with similar utilities, which can then be synthesized for a variety of applications. In this talk, I will describe my work on designing highly scalable and provably-efficient algorithms for a broad class of computationally expensive graph clustering problems. My research approach is to bridge theory and practice in parallel algorithms, which has resulted in the first practical solutions to a number of problems on graphs with hundreds of billions of edges. This talk will focus on new parallel algorithms for nucleus decomposition and correlation clustering, with improved theoretical bounds, demonstrably fast performance on real-world datasets, and real-world impact in production environments at Google. 


  • Date: Friday, April 21
  • Time: 10:00 am - 12:00 pm
  • Category:
  • Location: 32-D463
Additional Location Details:

Thesis Supervisor: Prof. Julian Shun

Contact the doctoral candidate ( if you wish to attend via zoom.