Linear-time Algorithms in Natural Language Understanding and Learning

NYU Community Event

Speaker: Liang Huang, City University of New York


Why are computers so bad at understanding natural language and why are human beings so much better at it? Can we build a model to simulate human language processing so that computers can process human language the same way we humans do, i.e., fast, incremental (left-to-right), and accurate?

In this talk I present a linear-time dynamic programming model for incremental parsing inspired by human sentence processing (from psycholinguistics) as well as compiler theory (LR parsing). This model, being linear-time, is much faster than, but also as accurate as, the dominant cubic-time algorithms. It overcomes the ambiguity explosion problem by approximate dynamic programming, which corresponds to local ambiguity packing in psycholinguistics.

But how do we efficiently learn such a parsing model with approximate inference from huge amounts of data? We propose a general structured machine learning framework based on the structured perceptron that is guaranteed to succeed with inexact search and works well in practice. Our new learning algorithm can learn a large-scale state-of-the-art parsing model with dramatically reduced training time, thus having the potential to scaling up to the whole Web. More importantly, our learning algorithms are widely applicable to other structured domains such as bioinformatics.


Liang Huang recently joined the The City University of New York (CUNY) as an Assistant Professor at Queens College and the Graduate Center. Until August 2012 he was a Research Assistant Professor at the University of Southern California (USC). He received his PhD from the University of Pennsylvania in 2008, and worked as a Research Scientist at Google before moving to USC/ISI. His research focuses on efficient search algorithms for natural language processing, esp. in parsing and machine translation, as well as related structured learning problems and their applications to non-language domains such as biological sequences. His work received a Best Paper Award at ACL 2008, and three Best Paper Nominations at ACL 2007, EMNLP 2008, and ACL 2010. He also received a University Graduate Teaching Award at Penn, and was nominated for the Engineering School Teaching Award at USC.