A new tool allows journalists to quickly sort through FOIA data dumps

In the 2020 fiscal year alone, federal agencies received nearly 800,000 requests under freedom of information laws. The process is notoriously frustrating, marked by delays, denials, and appeals before documents are turned over (if they ever are); for smaller newsrooms with fewer financial resources and less manpower, it may feel prohibitive.

NYU Tandon Professors Julia Stoyanovich, member of NYU's Center for Responsible AI and the Visualizing Imaging and Data Analysis Center (VIDA); and Mona Sloane, senior research scientist at the NYU Center for Responsible AI; worked with NYU Professor Hike Schellmann to lead a team of graduate students at NYU’s Center for Data Science to develop Gumshoe, an artificial-intelligence tool that uses natural language processing to sort through large caches of text documents and categorize them by relevance to the journalist’s main topic of investigation, reducing the time needed to sift through everything. 

The Gumshoe team developed the tool with an initial grant from the Center for Digital Humanities at NYU. A subsequent $200,000 grant, awarded last month by the Patrick J. McGovern Foundation, will enable the team to build out Gumshoe’s user interface and distribute the product widely. MuckRock, a nonprofit news site devoted to record requests, plans to integrate Gumshoe into its DocumentCloud platform, which is used by journalists for posting and reviewing public records.