Searching for Big Insights from Online Reviews

There is no question that data are big: 2.5 quintillion new bytes are added every day from our keyboards, sensors, entertainment, and medical scans, to name but a few. Data scientists have begun refining tools to encourage civilians to join them in extracting important insights from the aggregation and analysis of big data sets.

NYU Tandon Assistant Professor of Computer Science and Engineering Enrico Bertini and his graduate student Cristian Felix recently received a $35,000 Knight Foundation Prototype Fund grant to do just that for RevEx, which can perform faceted searches and analyze a combination of text and data across multiple domains. The Knight Foundation, which supports investigative journalism, will support refinements to RevEx that will enable reporters to elicit stories, but the appeal of the tool reaches well beyond the Fourth Estate.

Even before the Knight grant, RevEx—developed with graduate student Anshul Pandey—was embraced by journalists.

The nonprofit investigative journalism unit ProPublica used it to sort through millions of online reviews of medical services on Yelp. (Some findings: Thumbs down for low-cost dental clinics; medical doctors’ staff were panned more often than the actual care; and simple transactions like hair removal scored individual highs.) 

The Economist employed RevEx to discover some fundamental unfairness in students’ assessments on Rate My Professor. (Female professors consistently ranked lower, and “horrible” rankings seemed to correlate to the difficulty of the subject.)

But the most significant work so far for RevEx has been for the United Nations. Using RevEx as a starting point, Felix devised a tool that took top honors at in the U.N.’s #VisualizeChange: World Humanitarian Summit Data Challenge. Designed to analyze and visualize a staggering amount of data from citizens on a country-by-country basis, it will allow the U.N. to attack the most pressing local humanitarian problems.