Meet Alum Hao Fu – Electrical Engineering M.S. '19, Ph.D. '24
Building Trust in an Age of Intelligent Machines
Hao Fu arrived at NYU Tandon in 2017, as excitement about machine learning and AI was reaching a pitch. (Just the year before, Google DeepMind's AlphaGo program had defeated a human in Go, a complex board game, demonstrating that AI could master tasks previously thought impossible.)
Studying under Professor Farshad Khorrami, he earned his master’s degree in Electrical Engineering in 2019, followed by his doctoral degree five years later. Now a Machine Learning Engineer at TikTok, he began amassing a body of scholarly work while still a student. His story, however, isn't just about individual papers or techniques. It's a tale of systematically building trust in artificial intelligence. From detecting hidden backdoors to spotting unusual inputs, from protecting privacy in collaboration to ensuring system resilience, his research has addressed a fundamental question: How do we deploy AI systems we can rely on?
The answer, he discovered, lies not in perfect defenses but in adaptive, intelligent guardianship: AI systems that can detect when something is wrong, learn from experience, and gracefully handle the unexpected.
Early in his academic career, Fu addressed one of AI's most insidious vulnerabilities: backdoor attacks. (Imagine training a security camera AI to detect intruders, but someone then secretly teaches the system to ignore anyone wearing a red hat. Because it works perfectly in all other cases, the sabotage could be nearly invisible.)
His first major contribution was RAID (Removing Adversarial-Backdoors by Iterative Demarcation), which introduced a novel philosophy: what if we could create a living defense that learns and adapts in real-time? The approach was elegant. Using only a small collection of trusted, clean data, RAID trained two guardians: a novelty detector and a shallow neural network. As new data streamed in, these guardians flagged anything suspicious. But RAID didn't stop there. It also used an anomaly detector to carefully separate false alarms from real threats, then trained a support vector machine that continuously updated itself as it encountered new attacks. It was like having a security system that not only detected intruders but learned their tactics and grew stronger with each attempted breach.
Fu’s next breakthrough insight: trigger features behave fundamentally differently from normal features. A hidden backdoor trigger dominates the AI's decision-making in a way that natural features never do. It's like a hypnotic suggestion that overrides normal reasoning. He thus developed five metrics to measure this differential influence. By creating synthetic samples — mixing suspicious inputs with clean data — the metrics could expose the unnatural dominance of trigger features. Five specialized detectors, each trained on one metric, were then combined into a meta-detector that could spot poisoned inputs with remarkable accuracy. His approach required only a small amount of verified clean data and worked even when the attacker used sophisticated, invisible triggers.
Fu next explored whether poisoned samples were more robust to noise than clean ones. (Think about it: a backdoor trigger is like a password — add random noise to it, and the essential pattern is still apparent because the AI has been trained to lock onto it. Natural features, however, degrade more easily under noise.)
This observation became the foundation of a method for detecting all-to-one backdoor attacks regardless of where triggers appeared in the input. Fu explains that regardless of the input’s true label, once it is embedded with a backdoor trigger, the AI system consistently outputs a specific, attacker-chosen target label; in the all-to-one attack setting, inputs from various ground-truth classes — each poisoned with the backdoor trigger — are misclassified by the model as the same target class. In other words, the backdoor overrides the natural decision boundaries of the model, enforcing a uniform misclassification behavior across diverse input categories.
When the World Changes: CLIPScope
But backdoor attacks weren't the only threat Fu addressed. Sometimes AI systems fail not because of malice, but because the world has surprised them. A medical-diagnosis AI trained on one hospital's equipment might fail catastrophically with another hospital's scans. A self-driving car trained in California might panic in a Boston snowstorm. These out-of-distribution problems needed a different kind of solution.
Enter CLIPScope, which leverages recent advances in vision-language AI models — systems that understand both images and their textual descriptions. But Fu’s innovation isn't in the foundation model; it’s in how CLIPScope uses it. Previous methods asked: “Is this input unusual, regardless of what the system predicts?” If so, they ignore or reject the system’s output based solely on the input’s characteristics. Fu’s method asks: “Is this input unusual given the system’s prediction?” Then, instead of assessing the input in isolation, it evaluates whether the input aligns with or deviates from the expected distribution for the predicted class. This allows for more targeted and context-aware detection of abnormal inputs.
Fu’s Bayesian scoring approach conducted a probabilistic reality check, and CLIPScope mined a vast lexical database to identify potential out-of-distribution classes, selecting both the nearest and farthest concepts from normal categories to maximize coverage. As a result, zero-shot detection — spotting unusual inputs without ever seeing examples during training — became practical and effective.
Tackling other problems
Fu's vision extends beyond detection to system-level resilience. In cyber-physical systems (think power grids, aircraft, or industrial robots), attacks can have catastrophic real-world consequences. His approach combines switching mechanisms, re-initialization, and anomaly detection into a holistic, integrated defense involving multiple controllers operated in rotation, each periodically re-initialized to purge potential compromises, and an anomaly detector that watches for attacks during operation. (If something goes wrong, the system switches to a fresh controller.) Instead of permanently discarding compromised controllers, they are cleaned and returned to service, and the system maintains its defensive depth while continuously adapting.
Fu’s research has also addressed a different challenge: how can organizations collaborate to build better AI without exposing their private data? The answer lay in feature extraction. Instead of sharing raw data, entities could share transformed feature embeddings — mathematical representations that preserved useful patterns while obscuring sensitive details.
In an age where AI is increasingly used to make critical decisions in healthcare, transportation, finance, infrastructure, and other sectors, Fu’s work provides the foundations for deploying these systems responsibly. Not with blind faith in their perfection, but with intelligent mechanisms to catch their failures before they cause harm.