Brandon Reagen
-
Assistant Professor
Brandon Reagen is an Assistant Professor in the Department of Electrical and Computer Engineering with affiliation appointments in the Computer Science. He earned a PhD in computer science from Harvard in 2018 and received his undergraduate degrees in computer systems engineering and applied mathematics from the University of Massachusetts, Amherst, in 2012.
A computer architect by training, Brandon has a research focus on designing specialized hardware accelerators for applications including deep learning and privacy preserving computation. He has made several contributions to ease the use accelerators as general architectural constructs including benchmarking, simulation infrastructure, and System on a Chip (SoC) design. He has led the way in highly efficient and accurate deep learning accelerator design with his studies of principled unsafe optimizations, and his work has been published in conferences ranging from computer architecture, machine learning, computer aided design, and circuits.
Prior to joining NYU, he was a research scientist with Facebook’s AI Infrastructure Research working on privacy preserving machine learning and systems for neural recommendation. During his PhD he was a Siebel Scholar (2018) and was selected as a 2018 Rising Star in Computer Architecture by Georgia Tech.
Research News
New NYU Tandon-led project will accelerate privacy-preserving computing
Today's most advanced cryptographic computing technologies — which enable privacy-preserving computation — are trapped in research labs by one critical barrier: they're thousands of times too slow for everyday use.
NYU Tandon, helming a research team that includes Stanford University and the City University of New York, just received funding from a $3.8 million grant from the National Science Foundation to build the missing infrastructure that could make those technologies practical, via a new design platform and library that allows researchers to develop and share chip designs.
The problem is stark. Running a simple AI model on encrypted data takes over 10 minutes instead of milliseconds, a four order of magnitude performance gap that impedes many real-world use cases.
Current approaches to speeding up cryptographic computing have hit a wall, however. "The normal tricks that we have to get over this performance bottleneck won’t scale much further, so we have to do something different," said Brandon Reagen, the project's lead investigator. Reagen is an NYU Tandon assistant professor with appointments in the Electrical and Computer Engineering (ECE) Department and in the Computer Science and Engineering (CSE) Department. He is also on the faculty of NYU's Center for Advanced Technology in Telecommunications (CATT) and the NYU Center for Cybersecurity (CCS).
The team's solution is a new platform called "Cryptolets.”
Currently, researchers working on privacy chips must build everything from scratch. Cryptolets will provide three things: a library where researchers can share and access pre-built, optimized hardware designs for privacy computing; tools that allow multiple smaller chips to work together as one powerful system; and automated testing to ensure contributed designs work correctly and securely.
This chiplet approach — using multiple small, specialized chips working together — is a departure from traditional single, monolithic chip optimization, potentially breaking through performance barriers.
For Reagen, this project represents the next stage of his research approach. "For years, most of our academic research has been working in simulation and modeling," he said. "I want to pivot to building. I’d like to see real-world encrypted data run through machine learning workloads in the cloud without the cloud ever seeing your data. You could, for example, prove you are who you say you are without actually revealing your driver's license, social security number, or birth certificate."
What sets this project apart is its community-building approach. The researchers are creating competitions where students and other researchers use Cryptolets to compete in designing the best chip components. The project plans to organize annual challenges at major cybersecurity and computer architecture conferences. The first workshop will take place in October 2025 at MICRO 2025, which focuses on hardware for zero-knowledge proofs.
"We want to build a community, too, so everyone's not working in their own silos," Reagen said. The project will support fabrication opportunities for competition winners, with plans to assist tapeouts of smaller designs initially and larger full-system tapeouts in the later phases, helping participants who lack chip fabrication resources at their home institutions
"With Cryptolets, we are not just funding a new hardware platform—we are enabling a community-wide leap in how privacy-preserving computation can move from theory to practice,” said Deep Medhi, program director in the Computer & Information Sciences & Engineering Directorate at the U.S. National Science Foundation. “By lowering barriers for researchers and students to design, share and test cryptographic chips, this project aligns with NSF’s mission to advance secure, trustworthy and accessible technologies that benefit society at large."
If the project succeeds, it could enable a future where strong digital privacy isn't just theoretically possible, but practically deployable at scale, from protecting personal health data to securing financial transactions to enabling private AI assistants that never see people's actual queries.
Along with Reagen, the team is led by NYU Tandon co-investigators Ramesh Karri, ECE Professor and Department Chair, and faculty member of CATT and CCS; Siddharth Garg, Professor in ECE and faculty member of NYU WIRELESS and CCS; Austin Rovinski, Assistant Professor in ECE; The City College of New York’s Rosario Gennaro and Tushar Jois; and Stanford's Thierry Tambe and Caroline Trippel, with Warren Savage serving as project manager. The team also includes industry advisors from companies working on cryptographic technologies.
Cracking the code of private AI: The role of entropy in secure language models
Large Language Models (LLMs) have rapidly become an integral part of our digital landscape, powering everything from chatbots to code generators. However, as these AI systems increasingly rely on proprietary, cloud-hosted models, concerns over user privacy and data security have escalated. How can we harness the power of AI without exposing sensitive data?
A recent study, Entropy-Guided Attention for Private LLMs by Nandan Kumar Jha, a Ph.D. candidate at the NYU Center for Cybersecurity (CCS), and Brandon Reagen, Assistant Professor in the Department of Electrical and Computer Engineering and a member of CCS, introduces a novel approach to making AI more secure. The paper was presented at the Privacy-preserving Artificial Intelligence workshop at the AAAI Workshop on Privacy-Preserving Artificial Intelligence in early March.
The researchers delve into a fundamental, yet often overlooked, property of neural networks: entropy — the measure of information uncertainty within a system. Their work proposes that by understanding entropy’s role in AI architectures, we can improve the privacy, efficiency, and reliability of LLMs.
The Privacy Paradox in AI
When we interact with AI models — whether asking a virtual assistant for medical advice or using AI-powered legal research tools — our input data is typically processed in the cloud. This means user queries, even if encrypted in transit, are ultimately decrypted for processing by the model. This presents a fundamental privacy risk: sensitive data could be exposed, either unintentionally through leaks or maliciously via cyberattacks.
To design efficient private LLMs, researchers must rethink the architecture these models are built on. However, simply removing nonlinearities destabilizes training and disrupts the core functionality of components like the attention mechanism.
“Nonlinearities are the lifeblood of neural networks,” says Jha. “They enable models to learn rich representations and capture complex patterns.”
The field of Private Inference (PI) aims to solve this problem by allowing AI models to operate directly on encrypted data, ensuring that neither the user nor the model provider ever sees the raw input. However, PI comes with significant computational costs. Encryption methods that protect privacy also make computation more complex, leading to higher latency and energy consumption — two major roadblocks to practical deployment.
To tackle these challenges, Jha and Reagen’s research focuses on the nonlinear transformations within AI models. In deep learning, nonlinear functions like activation functions play a crucial role in shaping how models process information. The researchers explore how these nonlinearities affect entropy — specifically, the diversity of information being passed through different layers of a transformer model.
“Our work directly tackles this challenge and takes a fundamentally different approach to privacy,” says Jha. “It removes nonlinear operations while preserving as much of the model’s functionality as possible.”
Using Shannon’s entropy as a quantitative measure, they reveal two key failure modes that occur when nonlinearity is removed:
- Entropy Collapse (Deep Layers): In the absence of nonlinearity, later layers in the network fail to retain useful information, leading to unstable training.
- Entropic Overload (Early Layers): Without proper entropy control, earlier layers fail to efficiently utilize the Multi-Head Attention (MHA) mechanism, reducing the model’s ability to capture diverse representations.
This insight is new — it suggests that entropy isn’t just a mathematical abstraction but a key design principle that determines whether a model can function properly.
A New AI Blueprint
Armed with these findings, the researchers propose an entropy-guided attention mechanism that dynamically regulates information flow in transformer models. Their approach consists of Entropy Regularization — a new technique that prevents early layers from being overwhelmed by excessive information — and PI-Friendly Normalization — alternative methods to standard layer normalization that help stabilize training while preserving privacy.
By strategically regulating the entropy of attention distributions, they were able to maintain coherent, trainable behavior even in drastically simplified models, which ensures that attention weights remain meaningful, avoiding degenerate patterns that commonly arise once nonlinearity is removed, where a disproportionate number of heads exhibit extreme behavior — collapsing to near one-hot attention (low entropy) or diffusing attention uniformly (high entropy) — both of which impair the model’s ability to focus and generalize.
This work bridges the gap between information theory and architectural design, establishing entropy dynamics as a principled guide for developing efficient privacy-preserving LLMs. It represents a crucial step toward making privacy-preserving AI more practical and efficient in real-world applications. By bridging the gap between information theory and neural architecture design, their work offers a roadmap for developing AI models that are not only more private but also computationally efficient.
The team has also open-sourced their implementation, inviting researchers and developers to experiment with their entropy-guided approach.
arXiv:2501.03489v2 [cs.LG] 8 Jan 2025
DeepReDuce: ReLU Reduction for Fast Private Inference
This research was led by Brandon Reagen, assistant professor of computer science and electrical and computer engineering, with Nandan Kumar Jha, a Ph.D. student under Reagen, and Zahra Ghodsi, who obtained her Ph.D. at NYU Tandon under Siddharth Garg, Institute associate professor of electrical and computer engineering.
Concerns surrounding data privacy are having an influence on how companies are changing the way they use and store users’ data. Additionally, lawmakers are passing legislation to improve users’ privacy rights. Deep learning is the core driver of many applications impacted by privacy concerns. It provides high utility in classifying, recommending, and interpreting user data to build user experiences and requires large amounts of private user data to do so. Private inference (PI) is a solution that simultaneously provides strong privacy guarantees while preserving the utility of neural networks to power applications.
Homomorphic data encryption, which allows inferences to be made directly on encrypted data, is a solution that addresses the rise of privacy concerns for personal, medical, military, government and other sensitive information. However, the primary challenge facing private inference is that computing on encrypted data levies an impractically high penalty on latency, stemming mostly from non-linear operators like ReLU (rectified linear activation function).
Solving this challenge requires new optimization methods that minimize network ReLU counts while preserving accuracy. One approach is minimizing the use of ReLU by eliminating uses of this function that do little to contribute to the accuracy of inferences.
“What we are to trying to do there is rethink how neural nets are designed in the first place,” said Reagen. “You can skip a lot of these time and computationally-expensive ReLU operations and still get high performing networks at 2 to 4 times faster run time.”
The team proposed DeepReDuce, a set of optimizations for the judicious removal of ReLUs to reduce private inference latency. The researchers tested this by dropping ReLUs from classic networks to significantly reduce inference latency while maintaining high accuracy.
The team found that, compared to the state-of-the-art for private inference DeepReDuce improved accuracy and reduced ReLU count by up to 3.5% (iso-ReLU count) and 3.5× (iso-accuracy), respectively.
The work extends an innovation, called CryptoNAS. Described in an earlier paper whose authors include Ghodsi and a third Ph.D. student, Akshaj Veldanda, CryptoNAS optimizes the use of ReLUs as one might rearrange how rocks are arranged in a stream to optimize the flow of water: it rebalances the distribution of ReLUS in the network and removes redundant ReLUs.
The investigators will present their work on DeepReDuce at the 2021 International Conference on Machine Learning (ICML) from July 18-24, 2021.