Can Artificial Intelligence Find James Bond?

David Blei Discusses Probabilistic Machine Learning in Second AI Seminar Series


What if a movie executive let a computer cast an upcoming James Bond movie based entirely on which character was most influential at the box office? He or she might end wind up with a “you’re fired” memo and a James Bond movie without a 007 in it. After all, “M” and “Q” appear in every hit James Bond movie, don’t they? A simple linear regression might determine that the actor playing either of those characters is the real draw, and the guy playing Bond isn’t all that important.

Welcome to the world of confounders, the wrenches in the works of complex data sets that can reveal the limitations of machine learning (ML) systems that don’t see the whole picture (in this case, the fact that the really important casting decision is who gets to play Bond). Researchers who want to not only predict outcomes but understand the behavior of vast interrelated systems need ML that can look at multiple causes to find these hidden truths, and connect massive amounts of data to the big ideas behind scientific predictive theories and knowledge.

David Blei photo: Denise ApplewhiteDavid Blei, a professor of Statistics and Computer Science at Columbia University and a member of the Columbia Data Science Institute, used the fictional British secret agent to explore how next-generation artificial intelligence can not only find the wrenches but model meaning from an array of disparate, heterogeneous data points.

The third speaker in the Modern Artificial Intelligence fall seminar series hosted by the Department of Electrical and Computer engineering at NYU Tandon and organized by Professor Anna Choromanska, Blei works in the field of probabilistic ML that is applicable to a wide variety of real-life phenomena.

He explained that passive computation is overmatched by the sheer complexity and ubiquity of today’s “data-scape,” comprising IoT devices such as smartwatches, social platforms, sensors, retail data, and scientific data firehoses such as the Hubble telescope.

Blei, a recipient of the Sloan Fellowship, the National Science Foundation Presidential Early Career Award, and a Guggenheim Fellowship, among other honors, argued that with the right ML “pipelines” in place, the terabytes of data available today offer opportunities not just to predict outcomes, but to understand and interpret patterns.

“Your grandparents’ data sets were matrices, but data today can be a big bag of unstructured things like text, [which] isn’t easily formed as rows and columns but has collections of words with syntax, semantics and linguistic attributes,” he said.

Blei emphasized that, for example, a sociologist analyzing data from online communities merely to understand what might happen next on Facebook is fundamentally different than one who seeks to find patterns offering insights on how that data actually works.

From Blei’s perspective this kind of robust science — finding patterns driving Twitter activity, or, for that matter, identifying the actor for Bond who will attract audiences to the cinema on Friday night — requires a system called probabilistic ML, which connects domain knowledge, whether about society or physics or anything else, to the data extracted from it.

“Our goal is to build probabilistic ML that is expressive (that can do things like turn the latest developments into a machine learning algorithm that respects scientific theories); scalable (can handle terabytes of data); and is easy to develop,” he said.

In his own lab, Blei employs probabilistic ML in myriad ways, such as finding overlapping communities in big social networks, using MRI data to find areas of the brain activated when a subject looks at different objects, finding interpretable patterns in text data, and in econometrics — using big data from retail points to understand consumer behavior.

Each involve a “probabilistic pipeline,” which begins with assumptions about the various data sets, computes these assumptions and then uses the results of computation to develop models, finally revising the models based on the assumptions and computations.


“The Path to the Nobel Prize,” the final seminar in the series, taking place on Tuesday, December 11, 2018 at 370 Jay Street, 1201 Seminar Room, will feature Richard J. Roberts, Chief Scientific Officer at New England Biolabs, in Beverly, Massachusetts.