DeepReDuce: ReLU Reduction for Fast Private Inference

This research was led by Brandon Reagen, assistant professor of computer science and electrical and computer engineering, with Nandan Kumar Jha, a Ph.D. student under Reagen, and Zahra Ghodsi, who obtained her Ph.D. at NYU Tandon under Siddharth Garg, Institute associate professor of electrical and computer engineering.

Concerns surrounding data privacy are having an influence on how companies are changing the way they use and store users’ data. Additionally, lawmakers are passing legislation to improve users’ privacy rights. Deep learning is the core driver of many applications impacted by privacy concerns. It provides high utility in classifying, recommending, and interpreting user data to build user experiences and requires large amounts of private user data to do so. Private inference (PI) is a solution that simultaneously provides strong privacy guarantees while preserving the utility of neural networks to power applications.

Homomorphic data encryption, which allows inferences to be made directly on encrypted data, is a solution that addresses the rise of privacy concerns for personal, medical, military, government and other sensitive information. However, the primary challenge facing private inference is that computing on encrypted data levies an impractically high penalty on latency, stemming mostly from non-linear operators like ReLU (rectified linear activation function).

Solving this challenge requires new optimization methods that minimize network ReLU counts while preserving accuracy. One approach is minimizing the use of ReLU by eliminating uses of this function that do little to contribute to the accuracy of inferences.

“What we are to trying to do there is rethink how neural nets are designed in the first place,” said Reagen. “You can skip a lot of these time and computationally-expensive ReLU operations and still get high performing networks at 2 to 4 times faster run time.”

The team proposed DeepReDuce, a set of optimizations for the judicious removal of ReLUs to reduce private inference latency. The researchers tested this by dropping ReLUs from classic networks to significantly reduce inference latency while maintaining high accuracy.

The team found that, compared to the state-of-the-art for private inference DeepReDuce improved accuracy and reduced ReLU count by up to 3.5% (iso-ReLU count) and 3.5× (iso-accuracy), respectively.

The work extends an innovation, called CryptoNAS. Described in an earlier paper whose authors include Ghodsi and a third Ph.D. student, Akshaj Veldanda, CryptoNAS optimizes the use of ReLUs as one might rearrange how rocks are arranged in a stream to optimize the flow of water: it rebalances the distribution of ReLUS in the network and removes redundant ReLUs.

The investigators will present their work on DeepReDuce at the 2021 International Conference on Machine Learning (ICML) from July 18-24, 2021.