Tracking Wildlife Trafficking in the Age of Online Marketplaces
A team of researchers led by Juliana Freire is using data science to track — and combat — the illegal animal trade.
New analysis of the global trade in animal parts reveals tiger sharks to be one of the most popular sources of illegal trophies.
Wildlife trafficking is one of the world’s most widespread illegal trades, contributing to biodiversity loss, organized crime, and public health risks. Once concentrated in physical markets, much of this activity has moved online. Today, animals and animal products are advertised on large e-commerce platforms alongside ordinary consumer goods. This shift makes enforcement harder — but it also creates a valuable source of data.
Every online advertisement leaves behind digital information: text descriptions, prices, images, seller details, and timestamps. If collected and analyzed at scale, these traces can help researchers understand how wildlife trafficking operates online. The problem is volume. Online marketplaces contain millions of listings, and most searches for animal names return irrelevant results such as toys, artwork, or souvenirs. Distinguishing illegal wildlife ads from harmless products is difficult to do manually and challenging to automate.
Institute Professor of Computer Science Juliana Freire is part of a team that is taking on the problem head on, building a scalable system designed to address this challenge. They developed a flexible data collection pipeline that automatically gathers wildlife-related advertisements from the web and filters them using modern machine learning techniques. The goal is not to focus on one species or one website, but to enable broad, systematic monitoring across many platforms, regions, and languages, as well as to develop strategies to disrupt illegal markets.
The team is a multi-disciplinary effort, including Gohar Petrossian, Professor of Criminal Justice at John Jay College of Criminal Justice; Jennifer Jacquet, Professor of Environmental Science and Policy at the University of Miami; and Sunandan Chakraborty, Professor of Data Science at Indiana University.
The pipeline begins with web crawling. The researchers generate tens of thousands of search URLs by combining endangered species names with the search structures of major e-commerce websites. A specialized crawler then follows these links, downloading product pages while limiting requests to avoid overwhelming servers. Over just 34 days, the system retrieved more than 11 million ads.
Next comes information extraction. Product pages are messy and inconsistent, varying widely across websites. The pipeline uses a combination of HTML parsing tools and automated scrapers to extract useful details such as titles, descriptions, prices, images, and seller information. These data are stored in structured formats that allow large-scale analysis.
The most critical step is filtering. While machine learning classifiers can be used for this filtering, training specialized classifiers for multiple collection tasks is both time-consuming and expensive, requiring experts to create training data for each task. Freire’s group developed a new approach that leverages large-language models (LLMs) to label data and use the labeled data to automatically create specialized classifiers, which can perform data triage at a low cost and at scale.
This research has enabled large-scale data collection to answer different scientific questions and shed insights into different aspects of wildlife trafficking. One analysis of 14,000 reptile leather product listings on eBay showed that crocodile, alligator, and python skins dominated the market. Only about 10 animal-product combinations (such as ‘crocodile bags’, ‘alligator bags’ and ‘alligator watches’ made up about 72 percent of all listings, indicating that the trade heavily focuses on a few luxury items. The analysis of all of the listings from these sites showed that while small leather products were shipped from 65 countries, 93 percent came from 10 countries, with the United States, United Kingdom, Australia collectively accounting for over 3/4th of this market.
Similar data from Ebay on shark and ray trophies reveals that, although the platform has introduced policies to restrict threatened or endangered species, their derivatives are still circulated widely on the platform. Tiger shark trophies accounted for one-fifth of such listings, with asking prices up to $3,000. Over 85 percent of listings were linked to sellers in the United States, suggesting a pipeline from deep sea commercial fishing vessels to the US trophy trade.
This research is also being used to determine what would be the most effective way to disrupt this market. For example, the researchers found that targeting key sellers is effective, but targeting key product types — “alligator watch,” for example — breaks the market of reptile leather products equally effectively, and is much easier to enact at a broad scale.
The authors emphasize that this system is a starting point, not a finished solution. The pipeline is designed to be extensible, allowing future researchers to incorporate better classifiers, image-based analysis, or new data sources. By making the code openly available, they aim to support broader collaboration.
As wildlife trade continues to move online, understanding its digital footprint will be increasingly important. Scalable data collection tools like this one offer a way to transform scattered online listings into actionable knowledge, an essential step toward disrupting illegal wildlife trade in the digital era.
Juliana Silva Barbosa, Ulhas Gondhali, Gohar Petrossian, Kinshuk Sharma, Sunandan Chakraborty, Jennifer Jacquet, and Juliana Freire. 2025. A Cost-Effective LLM-based Approach to Identify Wildlife Trafficking in Online Marketplaces. Proc. ACM Manag. Data 3, 3, Article 119 (June 2025), 23 pages. https://doi.org/10.1145/3725256