Speaker: Rebekah Overdorf, Drexel University
Website fingerprinting attacks aim to uncover which webpage a user is visiting by monitoring the incoming and outgoing packets on a network. These attacks apply supervised classifiers to network traffic traces to identify patterns that are unique to a website. Even anonymous communication systems such as Tor are vulnerable to this type of attack. The adversary first records traffic from visits to a set of known websites of interest and extracts a website fingerprint for each one. The adversary has access to the communication between the client and the entry node to the Tor network, so he can record the unlabeled traffic and try to classify it based on the known fingerprints.
We aim to better inform defenses against these attacks. We perform multi-level feature analysis of website fingerprinting attacks using three state-of-the-art website fingerprinting methods on 482 Tor onion (hidden) services, the largest analysis of this kind completed on hidden services to date. Recent studies have shown that Tor hidden service websites are particularly vulnerable to website fingerprinting attacks due to their limited number and sensitive nature. We further show that certain sites are more vulnerable to such attacks than others. We use several methods to rank the features used by three state-of-the-art website fingerprinting methods from prior work to determine what makes a hidden service site easily unveiled by a website fingerprinting attack. Further, we analyze web-level features to determine which correlate to sites that are fingerprintable and present an analysis of misclassifications to inform guidelines for reducing the fingerprintability of a hidden service website
Rebekah Overdorf is a Ph.D. candidate in her 5th year at Drexel University. She is currently working in the Privacy Security and Automation lab with Dr. Rachel Greenstadt. Rebekah is interested in developing and applying machine learning methods to privacy and security problems, specifically problems in which the domain of the data used to train a model differs from the target data.