Incorporating latent or hidden variables is a crucial aspect of statistical modeling. I will present a statistical and a computational framework for guaranteed learning of a wide range of latent variable models. I will focus on two instances, viz., community detection and overcomplete representations.
The goal of community detection is to discover hidden communities from graph data. I will present a tensor decomposition approach for learning probabilistic mixed membership models. The tensor approach is guaranteed to correctly recover the mixed membership communities with tight guarantees. We have deployed it on many real-world networks, e.g. Facebook, Yelp and DBLP. It is easily parallelizable, and is orders of magnitude faster than the state-of-art stochastic variational approach.
I will then discuss recent results on learning overcomplete latent representations, where the latent dimensionality can far exceed the observed dimensionality. I will present two frameworks, viz., sparse coding and sparse topic modeling. Identifiability and efficient learning are established under some natural conditions such as incoherent dictionaries or persistent topics.
Anima Anandkumar is a faculty at the EECS Dept. at U.C. Irvine since August 2010. Her research interests are in the area of large-scale machine learning and high-dimensional statistics. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She has been a visiting faculty at Microsoft Research New England in 2012 and a postdoctoral researcher at the Stochastic Systems Group.