Big-Data Visualization Experts Make Using Scatter Plots Easier for Today’s Researchers

Scatter plots—which use horizontal and vertical axes to plot data points and display how much one variable is affected by another—have long been employed by researchers in various fields, but the era of Big Data, especially high-dimensional data, has caused certain problems. A data set containing thousands of variables—now exceedingly common—will result in an unwieldy number of scatter plots. If a researcher is to extract useful information from the plots, that number must be winnowed in some way.

In recent years, algorithmic methods have been developed in an attempt to detect plots that contain one or more patterns of interest to the researcher analyzing them, thereby providing a measure of guidance as the data is explored. While those techniques represent an important step, little attention has been paid to validating their results by comparing them to those achieved when human observers and analysts view large sets of plots and their patterns.

Members of Tandon’s data-visualization group, headed by Professor Enrico Bertini, have conducted a study that found that results obtained through algorithmic methods, such as those known as scagnostics, do not necessarily correlate well to human perceptional judgments when asked to group scatter plots based on their similarity. While the team identified several factors that drive such perceptual judgments, they assert that further work is needed to develop perceptually balanced measures for analyzing large sets of plots, in order to better guide researchers in such fields as medicine, aerospace, and finance, who are regularly confronted with high-dimensional data.

Lead author Anshul Vikram Pandey and other members of Bertini’s lab published their findings in “Towards Understanding Human Similarity Perception in the Analysis of Large Sets of Scatter Plots,” a 2016 paper that received an honorable mention at the Association for Computing Machinery’s Conference on Human Factors in Computing Systems (ACM CHI), considered to be the most important event in that field. With well over 2,000 papers up for consideration at ACM CHI each year, that honor places their study among the top few percent of those submitted.