Nearest-neighbor approaches have been widely used in numerous applications such as forecasting which news topics will go viral, recommending products to people in online stores, and identifying where objects are in images by looking at image patches. However, in many such cases, there has been little theoretical development in understanding when, why, and how well such approaches work accounting for structure in the problem of interest. This thesis bridges the gaps between theory and practice for such nonparametric inference methods for the three specific case studies of time series classification, online collaborative filtering, and patch-based image segmentation. In particular, for each of these problems, we prescribe a probabilistic model in which the data appear generated from unknown "latent sources" that capture key structure in the problem. These latent source models naturally lead to nearest-neighbor inference methods similar to ones already used in practice. We derive theoretical performance guarantees for these methods, relating quality of inference to the amount of training data available and structural properties specific to each case study.
Thesis Supervisors: Polina Golland, Devavrat Shah