Doctoral Thesis: Testing Assumptions of Learning Algorithms

Monday, March 4
2:00 pm - 3:30 pm

Star (D263)

By: Arsen Vasilyan

Supervisors: Jonathan Kelner and Ronitt Rubinfeld


  • Date: Monday, March 4
  • Time: 2:00 pm - 3:30 pm
  • Category:
  • Location: Star (D263)
Additional Location Details:


Our society increasingly relies on algorithms and data analysis to make critical decisions. Yet, almost all work in the theory of supervised learning has long relied on the following two assumptions:
1) Distributional assumptions: data satisfies conditions such as Gaussianity or uniformity over d-bit strings.
2) No distribution shift: when the classifier is deployed, data comes from the same distribution as the training examples.

While natural and often correct, these assumptions oftentimes do not hold. Yet, these assumptions are routinely made for giving theoretical guarantees for supervised learning algorithms. These guarantees can become null and void, should one of these algorithms be used in a setting where these assumptions do not hold. Overall, if critical decisions rely on theoretical reliability guarantees, incorrect assumptions can result in catastrophic failure.

While completely eliminating the dependence on assumptions is provably impossible, the first part of this thesis shows how to mitigate this dependence in order to avoid catastrophic failure. We introduce and develop testers which can alert a user if some assumptions are not satisfied. Conversely, if the tester finds a specific dataset to be satisfactory, then a user can be confident in the guarantee given by the learning algorithm. Thus, having such a tester enables a user to make sure that a learning algorithm is safe to apply to a specific dataset.

Leveraging insights from the area of property testing, the first part of this thesis constructs such testers for a number of well-studied function classes, addressing distributional assumptions and distribution shift.

The second part of the thesis shows how insights from property testing can also be used to enhance learning algorithms in a different way.
We consider the following problems relating to the well-studied class of monotone functions over d-bit strings:

i) Proper learning: Given uniform samples labelled by a monotone function f, find a monotone function that approximates f up to given error.
ii) Proper agnostic learning: Given uniform samples labelled by an arbitrary function f, find a monotone function that approximates f best among all monotone functions (up to given error).
iii) Distance approximation:  Given uniform samples labelled by an arbitrary function f, approximate the distance of f to the monotone function closest to f (up to given error).

Since 1996, it was known that all the problems above can be solved using only 2^(d^0.5) samples, yet prior to our work no known algorithms for any of these tasks ran in time 2^{o(d)}. The second half of this thesis closes this computational-to-statistical gap by giving algorithms for each of the problems above that run in time 2^{d^0.5}. This is done by leveraging insights from the theory of local computation algorithms.

Thesis based on joint works with Aravind Gollakota, Adam Klivans, Jane Lange, Ronitt Rubinfeld and Kostantinos Stavropoulos.