Towards Making Systems Forget with Machine Unlearning
Speaker: Yinzhi Cao, Lehigh University
Today’s systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data’s lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly.
This talk focuses on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations – asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use.
Bio: Yinzhi Cao is an assistant professor in Lehigh University. He earned his PhD in computer science at Northwestern University and worked at Columbia University as a postdoc. Before that, he obtained his B.E. degree in electronics engineering at Tsinghua University in China. His research mainly focuses on the security and privacy of web, smart phones, and machine learning. He has published more than ten papers at various security conferences, such as Oakland, NDSS, ACSAC and DSN. His JShield system has been adopted by Huawei, the world's largest telecommunication company, and his SafePay system was widely featured by many media outlets, such as NSF science360 news and Yahoo! news. Previously, he also conducted research at SRI International and UC Santa Barbara as a summer intern.
For more information, contact Nasir Memon.