Analytic Theories of Language, Creativity, and Reasoning in Artificial Intelligence
Part of the Special ECE Seminar Series
Modern Artificial Intelligence
Title:
Analytic Theories of Language, Creativity, and Reasoning in Artificial Intelligence
Speaker:
Surya Ganguli, Stanford University
Abstract:
Two major advances of the last decade in AI involve language models and diffusion models. However, the remarkable capabilities of such complex models often elude explanation through analytic theory. I will discuss several works in which simple analytic theories can quantitatively explain their performance characteristics. First, for language modeling, we can quantitatively predict for the first time, the power law exponents governing neural scaling laws relating loss to the amount of training data. We show these neural exponents are simply a function of two statistical properties of language itself. Second, for diffusion models, we develop an analytic theory of creativity that explains how they can generate exponentially many novel images from a finite training set by constructing patch mosaics of the training data. Our analytic theory predicts individual image outputs of trained convolution only diffusion models with high fidelity. Third, we show how co-designing learning during training time and search during test-time scaling can lead to improved mathematical reasoning using language models. Given time, we will also mention some of our work on constructing, explaining, and controlling digital twins of the brain.
Bio:
Surya Ganguli is a professor of Applied Physics at Stanford, a Senior Fellow of Stanford’s Human Centered AI Institute, and a Venture Partner at General Catalyst. Dr. Ganguli triple majored in physics, mathematics, and EECS at MIT, completed a Masters in Pure Mathematics and a Ph.D. in string theory at Berkeley, and a postdoc in theoretical neuroscience at UCSF. He has also been a visiting researcher at both Google and Meta AI and a Venture Partner at a16z. His research spans the fields of neuroscience, machine learning and physics, focusing on understanding and improving how both biological and artificial neural networks learn striking emergent computations. He has been awarded a Swartz-Fellowship in computational neuroscience, a Burroughs-Wellcome Career Award, a Terman Award, two NeurIPS Outstanding Paper Awards, a Sloan fellowship, a James S. McDonnell Foundation scholar award in human cognition, a McKnight Scholar award in Neuroscience, a Simons Investigator Award in the mathematical modeling of living systems, an NSF CAREER award, a Schmidt Science Polymath Award, and an AI2050 Senior Fellowship.