Doctoral Thesis: Format Abstractions for Compilation of Sparse Tensor Algebra
Multidimensional arrays (tensors) are commonly used to represent data in many domains, including data analytics, machine learning, engineering, and the physical sciences. Many highly-optimized libraries and compilers have been developed for efficiently computing on dense tensors. However, existing libraries and compilers are limited in their ability to support many real-world applications that work with sparse tensors, which contain mostly zeros. In particular, there exist countless specialized representations (formats) for storing sparse tensors in memory, each suited to specific types of applications and data. Since different formats can use very different data structures to store nonzeros though, computing with sparse tensors that are stored in different formats can require vastly dissimilar code that are difficult to implement by hand and also non-trivial to automatically generate. Existing libraries and compilers thus only support computing limited sets of operations on sparse tensors stored in limited sets of formats, sacrificing usability and performance as a result.
In this thesis, I show how to build a compiler that supports efficiently computing with sparse tensors stored in disparate formats. I first show how a wide range of commonly-used sparse tensor formats, from array-based formats such as CSR, COO, and DIA to pointer-based formats such as linked lists, BSTs, and C-trees, can all be expressed as compositions of per-dimension formats. I further demonstrate how users can define such per-dimension formats by implementing abstractions we developed that precisely capture how different data structures store nonzeros in memory. I then show how, with these implementations at hand, a compiler can generate code to efficiently compute with tensors that are stored in any of the aforementioned—and countless other—formats. We have implemented our technique in the TACO sparse tensor algebra compiler, which is the first compiler to generate code that computes any basic tensor algebra expression with sparse tensors stored in arbitrary formats. Our technique generates code that has performance competitive with equivalent implementations in hand-optimized libraries and frameworks.
- Date: Tuesday, June 7
- Time: 1:00 pm
- Category: Thesis Defense
- Location: via zoom
Additional Location Details:
Thesis Supervisor: Prof. Saman Amarasinghe
To attend via zoom, please contact the doctoral candidate at email@example.com