We live in an era with almost unlimited access to data. Yet without their proper tagging and annotation, we often struggle to make effective use of most of it. Sometimes, the labels we have access to are not even the ones we really need to accomplish the task at hand. Asking human experts for input can be time-consuming and expensive, thus bringing to bear a need for better ways to handle and process unlabeled data.
In particular, successful methods in unsupervised domain adaptation can automatically recognize and adapt existing algorithms to systematic changes in the input. Furthermore, methods that can organize incoming streams of information allow us to derive insights with minimal manual labeling effort -- this is the notion of weakly supervised learning.
In this thesis, we explore these two themes in the context of speaker and language recognition. First, we consider the problem of adapting an existing algorithm for speaker recognition to a systematic change in our input domain. Then we undertake the scenario in which we start with only unlabeled data and are allowed to select a subset of examples to be labeled, with the goal of minimizing the number of actively labeled examples needed to achieve acceptable speaker recognition performance. In this presentation, we will focus on the problem of language recognition, where we aim to decrease our reliance on transcribed speech via the use of a large-scale model for discovering sub-word units from multilingual data in an unsupervised manner. In doing so, we observe the impact of even small bits of linguistic knowledge and use this as inspiration to improve our sub-word unit discovery methods via the use of weak, pronunciation-equivalent constraints.
Thesis Supervisors: Dr. Jim Glass and Prof. Najim Dehak (now at Johns Hopkins University)