Doctoral Thesis: Flexible Energy-Aware Image and Transformer Processors for Edge Computing
Machine learning inference on edge devices for image and language processing has become increasingly common in recent years, but faces challenges associated with high memory and computation requirements, coupled with limited energy resources. We apply different quantization schemes and training techniques to reduce the cost of running these models and provide flexibility in the hardware. Energy scalability is achieved through bit width scaling, as well as model size scaling. We design, tapeout, and test three neural network accelerators that apply these techniques to enable efficient inference for a variety of applications.First, we demonstrate a CNN accelerator that simplifies computation using nonlinearly quantized weights by reordering multiplication and accumulation. This modified compute requires additional storage elements compared to a conventional approach. To minimize the area overhead, we design a custom accumulator array layout. The second chip targets moderately-sized Transformer models using piecewise-linear quantization for both weights and activations. Lastly, we present an energy-adaptive accelerator for natural language understanding based on lightweight Transformer models. The model size can by adjusted by sampling the weights of the full model to obtain differently sized submodels. We achieve up to 5.8x energy and latency scalability between the largest and smallest submodels.
Thesis Supervisor: Prof. Anantha P. Chandrakasan
- Date: Wednesday, July 26
- Time: 2:00 pm - 3:30 pm
- Category: Thesis Defense
- Location: 34-401A
Additional Location Details:
There will also be a zoom link available. Interested attendees can contact me (email@example.com) for details.