Co-optimize DNN Arithmetics and Hardware System for Efficient Inference and Training

Lecture / Panel
For NYU Community



Sai Qian Zhang
Meta Reality Labs


"Co-optimize DNN Arithmetics and Hardware System for Efficient Inference and Training"


In recent years, we have seen a proliferation of sophisticated Deep Neural Network (DNN) architectures that have achieved state-of-the-art performances across a variety of domains. However, the algorithmic superiority of DNNs levies high latency and energy taxes at all computing scales, which further poses significant challenges to the hardware platforms executing them. Given the fact that the DNN architectures and the hardware platform executing them are tightly coupled and tangled, my research lies in building a full-stack solution to co-optimize DNN across the architecture, datatype and supporting hardware system to achieve efficient inferences and training operations.

In this talk, I will first describe Column-Combining, an innovative pruning strategy that packs sparse filter matrices into a denser format for efficient deployment in a novel systolic architecture with nearly perfect utilization rate. Following that, I will then describe a bit-level quantization method named Term Quantization (TQ). Unlike the conventional quantization methods that operate on individual values, Term Quantization is a group- based method that keeps a fixed number of largest terms (nonzero bits in the binary representations) within a group of values, and this in turn leads to a significantly smaller amount of quantization error compared to other quantization approaches under the same bitwidth. Next, I will introduce the work I have done to facilitate the DNN training process. In particular, I will describe the Fast First, Accurate Second Training (FAST) system that adaptively adjusts the precision of the DNN operands for efficient DNN training. Last but not least, I will conclude with some of my recent research efforts and future research plans on further extending the frontiers of the DNN training hardware efficiency by leveraging the underlying reversibility of the DNN architecture.


Sai Qian Zhang is a research scientist at Meta Reality Labs. He also holds an appointment as research associate at Harvard University hosted by Prof. David Brooks and Prof. Gu-Yeon Wei. Sai received his Ph.D. degree from Harvard University under the supervision of Prof. H.T. Kung in 2021, and obtained both of his M.A.Sc and B.A.Sc degrees from University of Toronto.

Sai’s research interest lies in algorithm and hardware codesign for the efficient deep neural network implementation. He is also interested in multi-agent reinforcement learning and its applications on hardware system design. His works have been published in multiple top-tier conferences such as ASPLOS, NeurIPS, HPCA, and AAAI. He has won the best paper award at IEEE international conference on Communication.