# Population Recovery and Partial Identification

SHARE:

## Event Speaker:

Avi Wigderson, IAS

32-124

## Card Description:

We study several natural problems in which an {\em unknown} distribution over an {\em unknown} population of vectors needs to be recovered from partial or noisy samples. Such problems naturally arise in a variety of contexts in learning, clustering, statistics, data mining and database privacy, where loss and error may be introduced by nature, inaccurate measurements, or on purpose.

## Event Date/Time:

Wednesday, October 10, 2012 - 4:15pm

## Population Recovery and Partial Identification

Speaker: Avi Wigderson, IAS
Date: Wednesday, October 10 2012
Time: 4:15PM to 5:15PM
Refreshments: 3:45PM
Location: 32-124
Host: Dana Moshkovitz, CSAIL, MIT
Contact: Be Blackburn, 3-6098, imbe@mit.edu

Relevant URL: Special Date - Joint with Combinatorics

We study several natural problems in which an {\em unknown} distribution over an {\em unknown} population of vectors needs to be recovered from partial or noisy samples. Such problems naturally arise in a variety of contexts in learning, clustering, statistics, data mining and database privacy, where loss and error may be introduced by nature, inaccurate measurements, or on purpose. We give fairly efficient algorithms to recover the data under fairly general assumptions, when loss and noise are close to the information theoretic limit (namely, nearly completely obliterate the original data). Underlying one of our algorithms is a new structure we call a {\em partial identification (PID) graph}. While standard IDs are subsets of features (vector coordinates) that uniquely identify an individual in a population, partial IDs allow ambiguity (and "imposters"), and thus can be shorter. PID graphs capture this imposter-structure. PID graphs yield strategies for dimension reductions of recovery problems, and the re-assembly of this local pieces of statistical information to a global one. The combinatorial heart of this work is proving that every set of vectors admits partial IDs with "cheap" PID graphs (and hence efficient recovery). We further show how to find such near-optimal PIDs efficiently. Time permitting, I will also describe our original motivation for studying these recovery problems above: a new learning model we call "restriction access", introduced in earlier work. This model aims at generalizing prevailing "black-box" access to functions when trying to learn the "device" (e.g. circuit, decision tree, polynomial...) which computes them. We propose a "grey-box" access that allows certain partial views of the device, obtained from random restrictions. Our recovery algorithms above allow positive learning results for the PAC-learning analog of our model, for such devices as decision trees and DNFs, which are currently beyond reach in the standard "black-box" version of PAC-learning. Based on joint works with Zeev Dvir, Anup Rao and Amir Yehudayoff

See other events that are part of Theory Colloquium 2012/2013

See other events happening in October 2012