Speaker: Nizar Habash, Columbia University
The Arabic language is spoken by some 300 million people. It also is the language of worship for over 1.5 billion Muslims. In the context of natural language processing, Arabic poses a lot of challenges: Arabic is both morphologically rich and highly ambiguous. It has complex morpho-syntactic agreement rules and a lot of irregular forms. Arabic also has a large number of unstandardized dialectal variants that are as different from Standard Arabic as Romance languages are different from Latin. In this talk, I will present some of my research on addressing these challenges. My overall approach combines the use of linguistic knowledge with data-driven statistical modeling. I will also discuss the results of using the tools and resources that came out of my research in natural language applications, in particular, for machine translation.
Dr. Nizar Habash received his PhD in 2003 from the Computer Science Department, University of Maryland College Park. He is currently a research scientist at the Center for Computational Learning Systems in Columbia University. His research includes work on machine translation, natural language generation, lexical semantics, morphological analysis, generation and disambiguation, syntactic parsing, and computational modeling of Arabic dialects. He recently published the book "Introduction to Arabic Natural Language Processing". Nizar's website is at http://www.nizarhabash.com.