Doctoral Thesis: Self-Supervised Learning for Speech Processing

Thursday, April 14
3:00 pm

32-449 (Kiva)

Yu-An Chung

Deep neural networks trained with supervised learning algorithms on large amounts of labeled
speech data have achieved remarkable performance on various spoken language processing
applications, often being the state of the arts on the corresponding leaderboards. However, the
fact that training these systems relies on large amounts of annotated speech poses a scalability
bottleneck for the continued advancement of state-of-the-art performance, and an even more
fundamental barrier for deployment of deep neural networks in speech domains where labeled
data are intrinsically rare, costly, or time-consuming to collect.

In contrast to annotated speech, untranscribed audio is often much cheaper to accumulate. In
this thesis, we explore the use of self-supervised learning—a learning paradigm where the
learning target is generated from the input itself—for leveraging such easily scalable resources
to improve the performance of spoken language technology. Specifically, we propose two
self-supervised algorithms, one based on the idea of “future prediction” and the other based
on the idea of “predicting the masked from the unmasked,” for learning contextualized speech
representations from unlabeled speech data. We show that our self-supervised algorithms are
capable of learning representations that transform high-level properties of speech signals such
as their phonetic contents and speaker characteristics into a more accessible form than
traditional acoustic features, and demonstrate their effectiveness in improving the performance
of deep neural networks on a wide range of speech processing tasks. In addition to presenting
new learning algorithms, we also provide extensive analysis aiming to understand the
properties of the learned self-supervised representations, as well as disclosing the design
factors that make one self-supervised model different from the other.


  • Date: Thursday, April 14
  • Time: 3:00 pm
  • Location: 32-449 (Kiva)
Additional Location Details:

Thesis Supervisor(s): James Glass, Jacob Andreas, Phillip Isola

To attend this defense via zoom, please contact the doctoral candidate at