Events

Architecting High Performance Silicon Systems for Accurate and Efficient On-Chip Deep Learning

Lecture / Panel
 
For NYU Community

""

Speaker:

Thierry Tambe
PhD Candidate, Harvard University

Title:

"Architecting High Performance Silicon Systems for Accurate and Efficient On-Chip Deep Learning"

Abstract:

The unabated pursuit for omniscient and omnipotent AI is levying hefty latency, memory, and energy taxes at all computing scales. At the same time, the end of Dennard scaling is sunsetting traditional performance gains commonly attained with reduction in transistor feature size. Faced with these challenges, my research is building a heterogeneity of solutions co-optimized across the algorithm, memory subsystem, hardware architecture, and silicon stack to generate breakthrough advances in arithmetic performance, compute density and flexibility, and energy efficiency for on-chip machine learning, and natural language processing (NLP) in particular. I will start, in the algorithm front, by discussing award-winning work on developing a novel floating-point based data type, AdaptivFloat, which enables resilient quantized AI computations; and is particularly suitable for NLP networks with very large parameter distribution. Then, I will describe a 16nm chip prototype that adopts AdaptivFloat in the acceleration of noise-robust AI speech and machine translation tasks – and whose fidelity to the front-end application is verified via a formal hardware/software compiler interface. Towards the goal of lowering the prohibitive energy cost of inferencing large language models on TinyML devices, I will describe a principled algorithm-hardware co-design solution, validated in a 12nm chip tapeout, that accelerates Transformer workloads by tailoring the accelerator's latency and energy expenditures according to the complexity of the input query it processes. Finally, I will conclude with some of my current and future research efforts on further pushing the on-chip energy-efficiency frontiers by leveraging specialized non-conventional dynamic memory structures for on-device training -- and recently prototyped in a 16nm tapeout.

Bio:

Thierry Tambe is a final year Electrical Engineering PhD candidate at Harvard University advised by Prof. Gu-Yeon Wei and Prof. David Brooks. His current research interests focus on designing energy-efficient and high-performance algorithms, hardware accelerators and systems for machine learning and natural language processing in particular. He also bears a keen interest in agile SoC design methodologies. Prior to debuting his doctoral studies, Thierry was an engineer at Intel in Hillsboro, Oregon, USA designing various mixed-signal architectures for high-bandwidth memory and peripheral interfaces on Xeon and Xeon-Phi HPC SoCs. He received a B.S. (2010) and M.Eng. (2012) in Electrical Engineering from Texas A&M University. Thierry Tambe is a recipient of the Best Paper Award at the 2020 ACM/IEEE Design Automation Conference, a 2021 NVIDIA Graduate PhD Fellowship, and a 2022 IEEE SSCS Predoctoral Achievement Award.