Reconstructing Human Speech from Recorded Brain Activity Using Deep Learning

Lecture / Panel

Open to the Public

Zoom

Speaker:

Yao Wang, PhD
Vice Dean for Faculty Affairs and Professor
New York University Tandon School of Engineering

Abstract:

Decoding human speech from neural signals is essential for brain-computer interface (BCI) technologies that aim to restore speech in populations with neurological deficits. However, it remains a highly challenging task, compounded by the scarce availability of neural signals with corresponding speech and variability in locations where neural signals are sampled among different participants. In collaboration with Prof. Adeen Flinker from NYU Grossman School of Medicine, Prof. Wang and her research team have been developing approaches for reconstructing human speech from cortical signals obtained using intracranial electrodes. We will present a novel deep-learning-based framework that includes a neural decoder that translates cortical signals into interpretable speech parameters and a differentiable speech synthesizer that maps speech parameters to spectrograms. While our earlier approach, as with most prior works, can only work with electrodes on a dense 2D grid (i.e. Electrocorticographic (ECoG) array) and data from a single patient, our more recent neural decoder architecture can accommodate both surface ECoG and depth (stereotactic EEG or sEEG) electrodes, and can be trained using data from multiple participants. Our framework generates natural-sounding speech and has a high decoding correlation with the ground truth spectrogram over a large cohort of participants with either ECoG electrodes or sEEG electrodes, or both. Furthermore, the models trained on multiple participants demonstrated generalizability to unseen participants. Our model can leverage temporal operations that are either causal (utilizing current and past neural signals), anticausal (current and future neural signals), or noncausal (combining both), and can achieve high decoding performance even when limited to causal operations, essential for real-time neural prostheses. Furthermore, contribution analysis of causal and anti-causal models enables us to disentangle feedforward motor control from auditory feedback processing in speech production, revealing a surprisingly mixed feedforward and feedback cortical recruitment during speech production. Prof. Wang’s presentation will highlight the technical advancements, neuroscientific insights, and translational potential of this groundbreaking research.

Prof. Wang leads research at the intersection of video processing, medical imaging, and AI-driven applications. She earned her B.S. and M.S. degrees in Electronic Engineering from Tsinghua University in Beijing, China, and received her Ph.D. in Electrical and Computer Engineering from the University of California, Santa Barbara. Subsequently, she joined the faculty of the Polytechnic University (now NYU Tandon), becoming a full professor in 2000. Her contributions to video processing and communications earned her recognition as an IEEE Fellow and numerous accolades, including the New York City Mayor’s Award for Excellence in Science and Technology, multiple IEEE best paper awards. She also received the NYU Tandon Distinguished Teacher Award. Furthermore, Prof. Wang authored the widely used textbook Video Processing and Communications and has served as an associate editor for IEEE Transactions on Multimedia and IEEE Transactions on Circuits and Systems for Video Technology. Her research has been supported by the National Science Foundation and the National Institutes of Health.

Departments

Degrees & Programs

Resources

Overview

Community

News & Events

Share your thoughts to help us improve!

Reconstructing Human Speech from Recorded Brain Activity Using Deep Learning