Abstract: In this talk I will discuss a new generation of software tools based on probabilistic models learned from large codebases. These tools leverage the massive effort already spent by thousands of programmers and make useful predictions about new, unseen programs, thus helping to solve important and difficult software tasks. As an example, I will illustrate several such practical systems including statistical code completion, deobfuscation and defect prediction. Two of these systems (jsnice.org and apk-deguard.com) are freely available and already have thousands of users. In addition, I will also present some of the core machine learning techniques underlying our tools. I will discuss new probabilistic models of code that are more precise than state-of-the-art neural networks while requiring fewer computational resources to train and use.
Short Bio: Veselin Raychev obtained his PhD from ETH Zurich in 2016 on the topic of “Learning from Large Codebases”. Before this, he worked as a software engineer at Google on the public transportation routing algorithm of Google Maps as well as several other projects. His research interests include machine learning, program analysis, program synthesis and algorithms.
Host: Armando Solar-Lezama