Anna Choromanska

Associate Professor
Alfred. P. Sloan Fellow

Center for Advanced Technology in Telecommunications (CATT)

Connect

Anna Choromanska is an Associate Professor in the Department of Electrical and Computer Engineering (ECE) at the NYU Tandon School of Engineering. She is also affiliated with the NYU Center for Data Science (CDS), NYU Center for Urban science and Progress (CUSP), NYU Center for Advanced Technology in Communications (CATT), and Connected Cities with Smart Transportation (C2SMART) Center. Prior to joining ECE, she was conducting Post-Doctoral studies in the Computer Science Department at the Courant Institute of Mathematical Sciences in NYU under the guidance of Turing Award winner, Prof. Yann LeCun. Prof. Choromanska is a recipient of multiple awards, including the NSF CAREER Award, Alfred. P. Sloan Fellowship, and two IBM Global University Program Academic Awards.

Research: Prof. Choromanska's main research focus is deep learning (DL). This form of AI is useful for automatically finding high-quality representations of complex data that are suited for particular learning tasks. As data sets grow inexorably in size and complexity, it becomes ever more difficult to pull useful features from them using hand-crafted feature extractors; thus DL frameworks are becoming increasingly popular. The “Holy Grail” of DL, and one of the toughest challenges in all of modern ML, is to develop a fundamental understanding of DL optimization and generalization. Such an understanding is considered essential for designing efficient (fast-converging), accurate (well-generalizing), and scalable (applicable to large data sets and models and heavily parallelizable) DL optimization strategies. Better algorithmic tools for DL optimization and generalization should have strong impacts on a wide range of large data applications, with substantial savings of time and resources (today the cost of training a single state-of-the-art DL model can reach hundreds of thousands of dollars). Prof. Choromanska's research seeks to address these DL optimization/generalization challenges. In her laboratory, Prof. Choromanska studies how deep neural networks (DNNs) learn, and how to condition the DNN learning process to converge efficiently to high-quality solutions by properly designing the training and/or the DL system architecture. Her research is highly multi-disciplinary and involves DL sub-disciplines including optimization, continual learning, distributed
optimization, sparse coding, and conditional computations. This multi-disciplinary aspect of her research encourages the combination of experimental and theoretical work. Autonomous driving and extremely large dataset analysis are her principal applications of interest.

Service and industrial impact: Prof. Choromanska was a recipient of The Fu Foundation School of Engineering and Applied Science Presidential Fellowship at Columbia University in the City of New York. She co-authored several international conference papers and refereed journal publications, as well as book chapters. The results her works are used in production by Facebook (training production vision systems and entry to COCO competition) and Baidu, and in product development by NVIDIA. She is also a contributor to the open source fast out-of-core learning system Vowpal Wabbit (aka VW). Prof. Choromanska gave over 50 invited and conference talks and serves as a book editor (MIT Press volume), organizer of top machine learning events (workshops at conferences such as the International Conference on Neural Information Processing Systems), and a reviewer and area chair for several top machine learning conferences and journals.

Other interests: Prof. Anna Choromanska is also a pianist who has been playing piano since the age of six and has diplomas of two music schools. Her piano performance can be found here. She was also a bronze medalist of amateur couple dance. She was practicing standard and latin dance in the Columbia University Ballroom Dance Team. Prof. Choromanska is also an avid salsa dancer. She performed in Ache Performance Project of Frankie Martinez , the one of the most innovative and renowned Latin contemporary dancers of his generation, and practiced individually with one of the most charismatic female mambo dancers, Lori Ana Perez-Piazza. She also likes dancing hula, especially during her travels to Hawaii. Her dance performances can be found here, here and here. Finally, prof. Choromanska loves painting and fashion design techniques.

Prof. Choromanska is the director of the Learning Systems Laboratory (LSL).

Research Interests

Machine Learning, Deep Learning: - understanding how machines acquire knowledge,
- optimization and training for deep learning and beyond, - large data analysis
Autonomous Driving: - building intelligent road autonomy

Education

Warsaw University of Technology
MSc, Department of Electronics and Information Technology, 2009

Columbia University in the City of New York
M.Phil. and Ph.D., Department of Electrical Engineering, 2014

Pre-Professional Experience

New York University, Courant Institute of Mathematical Sciences, Computer Science Department
Post-Doctoral Associate
From: April 2014 to December 2016
Working on deep learning (advisor: Prof. Yann LeCun).

Microsoft Research, New York
Summer Internship and Reserch Collaboration
From: June 2012 to September 2013 and September 2013 to June 2014
Working on logarithmic time extreme multiclass classi cation (advisor: Dr John Langford).

IBM T.J.Watson Research Center
Research Collaboration
From: May 2012 to June 2013
Recipient of a grant from the Speech and Language Algorithms Department at IBM T. J. Watson Research Center (for one semester). Working on optimization for large scale learning problems involving conditional random fields, log-linear models, and deep belief networks (advisor: Dr Dimitri Kanevsky, since 04.2013 joint work also with Prof. Aleksandr Aravkin).

ATT Research Laboratories
Summer Internship
From: July 2012 to September 2012
Working on iPLAN project: data analysis and modeling, and data matching (advisor: Dr Alice Chen, manager: Dr Phyllis Weiss).

University of Hawaii at Manoa, Deptartment of Electrical Engineering
Visiting Summer Scholar
From: November 2008 to November 2008
Working on Empirical Mode Decomposition (advisor: Prof. David Y. Y. Yun).

University of Pennsylvania, Smell and Taste Center, Department of Otorhinolaryngology, Head and Neck Surgery
Visiting Summer Scholar
From: September 2008 to September 2008 (with several week cooperation before)
Working on improving software and hardware for electrogustometric medical trials (advisor: Prof. Richard Doty).

University of North Texas Health Science Center, Center for Commercialization of Fluorescence Technologies
Visiting Summer Scholar
From: September 2008 to September 2008
Working on fast algorithms for visualization and analysis of lung epithelial cells imagined using fluorescence technology (advisor: Prof. Ignacy Gryczynski and Prof. Zygmunt Gryczynski).

Centre de Recherche du Centre Hospitalier Universitaire de Montreal, in cooperation with the Center for Commercialization of Fluorescence
Technologies, University of North Texas Health Science Center, Forth Worth, Texas
Summer Internship
From: July 2008 to September 2008
Working on fast algorithms for visualization and analysis of lung epithelial cells imagined using fluorescence technology (advisor: Prof. Ryszard Grygorczyk). The project was supported by the Canadian Institutes of Health Research (CIHR) and Natural Sciences and Engineering Research Council of Canada (NSERC).

Awards

HONORS, AWARDS, AND ACHIEVEMENTS

Scientific:

NSF CAREER Award, 2021

IBM Global University Program Academic Award, 2021

Alfred P. Sloan Research Fellowship in Computer Science, 2020

IBM Faculty Award, 2020

Student Best Paper Award, First Place, for the work T. Jebara, A. Choromanska, Majorization for CRFs and Latent Likelihoods, 7th Annual Machine Learning Symposium, New York Academy of Science, 2012

Student Best Paper Award, Third Place, for the work A. Choromanska, C. Monteleoni, Online clustering with experts, 6th Annual Machine Learning Symposium, New York Academy of Science, 2011

The Fu Foundation School of Engineering and Applied Science Presidential Fellowship holder, Columbia University in the City of New York, 2009-2012

Departmental Scholarship holder for the Achievements in Science, Warsaw University of Technology, Department of Electronics and Information Technology, 2005-2009

Winner (first place) of the National Mathematics Competition held by Warsaw University of Technology, 2004

Laureate of the National Physics Competition held by Warsaw University of Technology, 2004

Other:

Diploma of the Warsaw School of Art \Labirynt" (painting), 2007

Bronze medalist of amateur couple dance, 2006

Diploma of the Summer School of Italian Language in Rome, 2006

CONTRIBUTOR

Open source own implementations: Majority of codes connected with published papers are publicly released (website and/or GitHub).

Open source systems: Vowpal Wabbit (aka VW) open source fast out-of-core learning system library and program.

Industry:

Robotic platform based on subscale car from [S. Fang, A. Choromanska, Reconfigurable Network for Efficient Inferencing in Autonomous Vehicles, 2018] deployed by NVIDIA Automotive HMI team for testing autonomous driving systems NVIDIA

EASGD algorithm from [S. Zhang, A. Choromanska, Y. LeCun, Deep learning with Elastic Averaging SGD, in the Neural Information Processing Systems Conference (NIPS), 2015] is used in production by Facebook (training production vision systems and entry to COCO competition) and Baidu

Publications

Conferences:

J. Wang, Y. Teng, A. Choromanska, AutoDrop: Training Deep Learning Models with Automatic Learning Rate Drop, in the Conference on Uncertainty in Artificial Intelligence (UAI), 2024. Acceptance Rate [27\%]. pdf

H. He, J. Wang, A. Choromanska, Adjacent Leader Decentralized Stochastic Gradient Descent, in the European Conference on Artificial Intelligence (ECAI), 2024. Acceptance Rate [23\%]. pdf

T. Dimlioglu, A. Choromanska, GRAWA: Gradient-based Weighted Averaging for Distributed Training of Deep Learning Models, in the International Conference on Artificial Intelligence and Statistics (AISTATS), 2024. Acceptance Rate [28\%]. pdf

H. Zhu, H. He, A. Choromanska, S. Ravindran, B. Shi, L. Chen, Multi-View Radar Autoencoder for Self-Supervised Automotive Radar Representation Learning, in the IEEE Intelligent Vehicles Symposium (IEEE IV), 2024 pdf

V. Singh, A. Choromanska, S. Li, Y. Du, Wake-Sleep Energy Based Models for Continual Learning, in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Continual Learning in Computer Vision, 2024 pdf

H. Zhu, M. Majzoubi, A. Jain, A. Choromanska, TAME: Task Agnostic Continual Learning using Multiple Experts, in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Continual Learning in Computer Vision, 2024 pdf

S. Fang, H. Zhu, D. Bisla, A. Choromanska, S. Ravindran, D. Ren, R. Wu, ERASE-Net: Efficient Segmentation Networks for Automotive Radar Signals, in the IEEE International Conference on Robotics and Automation (ICRA), 2023 pdf

D. Bisla, J. Wang, A. Choromanska, Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape, in the International Conference on Artificial Intelligence and Statistics (AISTATS), 2022. Acceptance Rate [29\%]. pdf

Y. Teng, A. Choromanska, M. Campbell, S. Lu, P. Ram, L. Horesh, Overcoming Catastrophic Forgetting via Direction-Constrained Optimization, in the European Conference on Machine Learning and Data Mining (ECML-PKDD), 2022. Acceptance Rate [26\%]. pdf

S. Fang, A. Choromanska, Backdoor attacks on the DNN Interpretation System, in the AAAI Conference on Artificial Intelligence (AAAI), 2022. Acceptance Rate [15%]. (extension of NeurIPS 2020 paper) pdf

A. N. Saridena, A. Choromanska, Efficient patching of DNNs for Autonomous Vehicles, in the IEEE International Conference on Robotics and Automation (ICRA), 2022

D. Bisla, A. N. Saridena, A. Choromanska, A Theoretical‐Empirical Approach to Estimating Sample Complexity of DNNs, in the IEEE Conference on Computer Vision and Patern Recognition (CVPR) Second Workshop on Fair, Data-Efficient, and Trusted Computer Vision (TCV), 2021 pdf

C. Lema, A. Choromanska, Approximating Ground State Energies and Wave Functions of Physical Systems with Neural Networks, in the Neural Information Processing Systems Conference (NeurIPS) Workshop on Machine Learning and the Physical Sciences, 2020 pdf

S. Fang, A. Choromanska, Backdoor attacks on the DNN Interpretation System, in the Neural Information Processing Systems Conference (NeurIPS) Workshop on Dataset Curation and Security, 2020 pdf

J. Wang, A. Choromanska, SGB: Stochastic Gradient Bound Method for Optimizing Partition Functions, in the Neural Information Processing Systems Conference Workshop on Optimization for Machine Learning (NeurIPS OPT), 2020 pdf

Y. Teng, A. Choromanska, M. Campbell, Continual learning with direction-constrained optimization, in the Neural Information Processing Systems Conference (NeurIPS) Workshop on Meta-Learning, 2020 pdf

A. Pacchiano, J. Parker-Holder, Y. Tang, A. Choromanska, K. Choromanski, M. I. Jordan, Learning to score behaviors for guided policy optimization, in the International Conference on Machine Learning (ICML), 2020 pdf

S. Fang, A. Choromanska, Multi-modal Experts Network for Autonomous Driving, in the IEEE International Conference on Robotics and Automation (ICRA), 2020 pdf

M. Majzoubi, A. Choromanska, LdSM: Logarithm-depth Streaming Multi-label Decision Trees, in the International Conference on Artificial Intelligence and Statistics (AISTATS), 2020 talk pdf

Y. Teng, W. Gao, F. Chalus, A. Choromanska, D. Goldfarb, A. Weller, Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models, in the Neural Information Processing Systems Conference (NeurIPS), 2019. Acceptance Rate [21%]. pdf (see also pdf for a gentle extension of this work and the link for the PyTorch-based comprehensive distributed training library for deep networks that contains codes for LSGD, as well as for several other methods)

A. Choromanska, B. Cowen, S. Kumaravel, R. Luss, M. Rigotti, I. Rish, B. Kingsbury, P. DiAchille, V. Gurev, R. Tejwani, D. Bouneouf, Beyond Backprop: Online Alternating Minimization with Auxiliary Variables, in the International Conference on Machine Learning (ICML), 2019. Acceptance
Rate [23%]. pdf

D. Bisla, A. Choromanska, R. Berman, D. Polsky, J. Stein, Towards Automated Melanoma Detection with Deep Learning: Data Purication and Augmentation, in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ISIC Skin Image Analysis Workshop, 2019 pdf

S. Fang, A. Choromanska, Reconﬁgurable Network for Eﬃcient Inferencing in Autonomous Vehicles, in the International Conference on Robotics and Automation (ICRA), 2019 pdf

M. Bojarski, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, U. Muller, P. Yeres, K. Zieba, VisualBackProp: efficient visualization of CNNs for autonomous driving, in the International Conference on Robotics and Automation (ICRA), 2018 pdf

N. Patel, A. N. Saridena, A. Choromanska, P. Krishnamurthy, F. Khorrami, Adversarial Learning Based On-Line Anomaly Monitoring for Assured Autonomy, in the International Conference on Intelligent Robots and Systems (IROS), 2018 pdf

S. Minaee, Y. Wang, A. Choromanska, S. Chung, X. Wang, E. Fieremans, S. Flanagan, J. Rath, Y. W. Lui, A Deep Unsupervised Learning Approach Toward MTBI Identification Using Diffusion MRI, in the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2018 pdf

N. Patel, A. Choromanska, P. Krishnamurthy, F. Khorrami, Sensor Modality Fusion with CNNs for UGV Autonomous Driving in Indoor Environments, in the International Conference on Intelligent Robots and Systems (IROS), 2017 pdf

Y. Jernite, A. Choromanska, D. Sontag, Simultaneous Learning of Trees and Representations for Extreme Classification and Density Estimation, in the International Conference on Machine Learning (ICML), 2017 pdf

P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, R. Zecchina, Entropy-SGD: Biasing Gradient Descent Into Wide Valleys, in the International Conference on Learning Representations (ICLR), 2017. Acceptance Rate [36%]. pdf

M. Bojarski, A. Choromanska, K. Choromanski, F. Fagan, C. Gouy-Pailler, A. Morvan, N. Sakr, T. Sarlos, J. Atif, Structured adaptive and random spinners for fast machine learning computations, in the International Conference on Artificial Intelligence and Statistics (AISTATS), 2017. Acceptance Rate [31.70%]. pdf

A. Choromanska, K. Choromanski, M. Bojarski, T. Jebara, S. Kumar, Y. LeCun, Binary embeddings with structured hashed projections, in the International Conference on Machine Learning (ICML), 2016. Oral presentation: Acceptance Rate [24.27%]. pdf

A. Choromanska, J. Langford, Logarithmic Time Online Multiclass prediction, in the Neural Information Processing Systems Conference (NIPS), 2015. Spotlight talk: Acceptance Rate [3.65%]. talk pdf

S. Zhang, A. Choromanska, Y. LeCun, Deep learning with Elastic Averaging SGD, in the Neural Information Processing Systems Conference (NIPS), 2015. Spotlight talk: Acceptance Rate [3.65%]. (extension of ICLR 2015 paper) talk pdf

S. Zhang, A. Choromanska, Y. LeCun, Deep learning with Elastic Averaging SGD (initial results), in the International Conference on Learning Representations (ICLR) Workshop, CoRR, abs/1412.6651v5, 2015

A. Choromanska, Y. LeCun, G. Ben Arous, Open Problem: The landscape of the loss surfaces of multilayer networks, in the Conference on Learning Theory (COLT), Open Problems, 2015 pdf

A. Choromanska, M. B. Henaff, M. Mathieu, G. Ben Arous, Y. LeCun, The Loss Surfaces of Multilayer Networks, in the International Conference on Artificial Intelligence and Statistics (AISTATS), 2015 pdf

A. Y. Aravkin, A. Choromanska, T. Jebara, D. Kanevsky, Semistochastic quadratic bound methods (initial results), in the International Conference on Learning Representations (ICLR) Workshop, CoRR, abs/1309.1369, 2014 pdf

A. Choromanska, T. Jebara, H. Kim, M. Mohan, C. Monteleoni, Fast spectral clustering via the Nystrom method, in the International Conference on Algorithmic Learning Theory (ALT), 2013 pdf

A. Choromanska, K. Choromanski, G. Jagannathan, C. Monteleoni, Differentially-Private Learning of Low Dimensional Manifolds, in the International Conference on Algorithmic Learning Theory (ALT), 2013 pdf

A. Choromanska, A. Agarwal, J. Langford, Extreme Multi Class Classification, in the Neural Information Processing Systems Conference (NIPS) Workshop: eXtreme Classification, 2013

T. Jebara, A. Choromanska, Majorization for CRFs and Latent Likelihoods, in the Neural Information Processing Systems Conference (NIPS), 2012. Spotlight talk: Acceptance Rate [3.58%] (Student Best Paper Award, First Place, on the 7th Annual Machine Learning Symposium, New York Academy of Science, 2012) talk pdf

A. Choromanska, C. Monteleoni, Online clustering with experts, in the International Conference on Artificial Intelligence and Statistics (AISTATS), 2012. Oral presentation: Acceptance Rate [5.97%] (Student Paper Award, Third Place, on the 6th Annual Machine Learning Symposium, New York Academy of Science, 2011) talk pdf and supplement

A. Choromanska, D. Kanevsky, T. Jebara, Majorization for Deep Belief Networks, in the Neural Information Processing Systems Conference (NIPS) Workshop: Log-linear models, 2012

A. Choromanska and C. Monteleoni, Online Clustering with Experts (initial results), in the International Conference on Machine Learning (ICML) Workshop: Online Trading of Exploration and Exploitation 2, Journal of Machine Learning Research (JMLR) Workshop and Conference Proceedings, 2011 pdf

Journals and book chapters:

T. Dimlioglu, J. Wang, D. Bisla, A. Choromanska, S. Odie, L. Bukhman, A. Olomola, J. D. Wong, Automatic Document Classification via Transformers for Regulations Compliance Management in Large Utility Companies, in the Neural Computing and Applications, 2023 pdf

A. N. Saridena, A. Choromanska, DNN Patching: Progressive Fixing and Augmenting the Functionalities of DNNs for Autonomous Vehicles, in the IEEE Robotics and Automation Letters (RA-L), 2022 pdf

N. Patel, A. N. Saridena, A. Choromanska, P. Krishnamurthy, F. Khorrami, Learning-Based Real-Time Process-Aware Anomaly Monitoring for Assured Autonomy, in the IEEE Transactions on Intelligent Vehicles, 2020 pdf

B. Cowen, A. Nandini Saridena, A. Choromanska, LSALSA: Accelerated Source Separation via Learned Sparse Coding, in the Machine Learning, 2019 (the paper was also accepted for presentation in the ECML-PKDD conference) pdf

P. Chaudhari, A. Choromanska, S. Soatto, Y. LeCun, C. Baldassi, C. Borgs, J. Chayes, L. Sagun, R. Zecchina, Entropy-SGD: Biasing Gradient Descent Into Wide Valleys, in the Journal of Statistical Mechanics: Theory and Experiment, 2019 pdf

Y. Teng, A. Choromanska, Invertible Autoencoder for domain adaptation, in the MDPI Computation, 2019 pdf

A. Choromanska, I. K. Jain, Extreme Multiclass Classification Criteria, in the MDPI Computation, 2019 pdf

N. Patel, A. Choromanska, P. Krishnamurthy, F. Khorrami, A Deep Learning Gated Architecture for UGV Navigation Robust to Sensor Failures, in the Journal of Robotics and Autonomous Systems, 2019 pdf

A. Y. Aravkin, A. Choromanska, T. Jebara, D. Kanevsky, Chapter: Semistochastic quadratic bound methods, in Log-Linear Models, Extensions and Applications, MIT Press, 2018 pdf

A. Choromanska, K. Choromanski, G. Jagannathan, C. Monteleoni, Differentially-Private Learning of Low Dimensional Manifolds, in the Theoretical Computer Science, 2015 pdf

A. Choromanska, S-F. Chang, R. Yuste, Automatic Reconstruction of 3D neural morphologies using multi-scale graph-based tracking, in the Frontiers in Neural Circuits, 6:25, 2012 pdf

Phd Thesis:

A. Choromanska, Selected machine learning reductions, PhD Thesis, 2014 pdf

Technical reports:

B. McShea, K. Wright, D. Lam, S. Schmidt, A. Choromanska, D. Bisla, S. Fang, Alireza Sarmadi, Prashanth Krishnamurthy, Farshad Khorrami, ESAFE: Enterprise Security and Forensics at Scale, 2021 pdf

D. Bisla, A. Choromanska, VisualBackProp for learning using privileged information with CNNs, 2019 pdf

M. Bojarski, P. Yeres, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, U. Muller, Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car, CoRR, abs/1704.07911, 2017 pdf

A. Choromanska, K. Choromanski, M. Bojarski, On the boosting ability of top-down decision tree learning algorithm for multiclass classification, CoRR, abs/1605.05223, 2016 pdf

M. Bojarski, A. Choromanska, K. Choromanski, Y. LeCun, Differentially- and non-differentially-private random decision trees, CoRR, abs/1410.6973, 2015 pdf

K. Choromanski, A. Choromanska, M. Bojarski, Deep Neural Networks reconstruct graphons, 2015

A. Agarwal, A. Choromanska, K. Choromanski, Notes on Using Determinantal Point Processes for Clustering with Applications to Text Clustering, CoRR, abs/1410.6975, 2014 pdf

A. Choromanska, T. Jebara, Stochastic Bound Majorization, CoRR, abs/1309.5605, 2013 pdf

Pre-prints:

H. Zhu, Z. Dong, K. Topollai, A. Choromanska, AD-L-JEPA: Self-Supervised Spatial World Models with Joint Embedding Predictive Architecture for Autonomous Driving with LiDAR Data, submitted, 2025 pdf

Codes are on Github or available upon request.

NSF CAREER From Analysis to Practice: Landscape-driven Optimization Algorithms for Deep Learning

The focus of this research project is on reconciling the dichotomy between the optimizers that are commonly used to train deep learning models (generic convex optimization tools) and the actual, non-convex properties of deep learning loss functions. We provide a fundamental study of the landscape of non-convex loss functions arising in a deep learning setting and the learning characteristics of deep learning systems with a goal of obtaining an "alphabet" of basic optimization and generalization properties of these systems that hold across a variety of model architectures and data sets. We use the acquired knowledge to develop a new generation of optimization strategies tailored to the deep learning setting, including parallel schemes dedicated to large data and models. This work is sponsored by the NSF CAREER Award #2041872 and the resulting research papers are listed below.

Abstract: Modern deep learning (DL) architectures are trained using variants of the SGD algorithm and typically rely on the user to manually drop the learning rate when the training curve saturates. In this paper, we develop an algorithm, that we call AutoDrop, that realizes the learning rate drop automatically and stems from the properties of the learning dynamics of DL systems. Specifically, it is motivated by the observation that the angular velocity of the model parameters, i.e., the velocity of the changes of the convergence direction, for a fixed learning rate initially increases rapidly and then progresses towards soft saturation. At saturation, the optimizer slows down thus the angular velocity saturation is a good indicator for dropping the learning rate. After the drop, the angular velocity {“}resets{”} and follows the pattern described above, increasing again until saturation. AutoDrop is built on this idea and drops the learning rate whenever the angular velocity saturates. The method is simple to implement, computationally cheap, and by design avoids the short-horizon bias problem. We show that AutoDrop achieves favorable performance compared to many different baseline manual and automatic learning rate schedulers, and matches the SOTA performance on all our experiments. On the theoretical front, we claim two contributions: we formulate the learning rate behavior based on the angular velocity and provide general convergence theory for the learning rate schedulers that decrease the learning rate step-wise, rather than continuously as is commonly analyzed.

H. He, J. Wang, A. Choromanska, Adjacent Leader Decentralized Stochastic Gradient Descent,” in the European Conference on Artificial Intelligence (ECAI), 2024. Acceptance Rate [23%]. pdf

Abstract: This work focuses on the decentralized deep learning optimization framework. We propose Adjacent Leader Decentralized Gradient Descent (AL-DSGD), for improving final model performance, accelerating convergence, and reducing the communication overhead of decentralized deep learning optimizers. AL-DSGD relies on two main ideas. Firstly, to increase the influence of the strongest learners on the learning system it assigns weights to different neighbor workers according to both their performance and the degree when averaging among them, and it applies a corrective force on the workers dictated by both the currently best-performing neighbor and the neighbor with the maximal degree. Secondly, to alleviate the problem of the deterioration of the convergence speed and performance of the nodes with lower degrees, AL-DSGD relies on dynamic communication graphs, which effectively allows the workers to communicate with more nodes while keeping the degrees of the nodes low. Experiments demonstrate that AL-DSGD accelerates the convergence of the decentralized state-of-the-art techniques and improves their test performance especially in the communication constrained environments. We also theoretically prove the convergence of the proposed scheme. Finally, we release to the community a highly general and concise PyTorch-based library for distributed training of deep learning models that supports easy implementation of any distributed deep learning approach ((a)synchronous, (de)centralized).

Abstract: We study distributed training of deep learning models in time-constrained environments. We propose a new algorithm that periodically pulls workers towards the center variable computed as a weighted average of workers, where the weights are inversely proportional to the gradient norms of the workers such that recovering the flat regions in the optimization landscape is prioritized. We develop two asynchronous variants of the proposed algorithm that we call Model-level and Layer-level Gradient-based Weighted Averaging (resp. MGRAWA and LGRAWA), which differ in terms of the weighting scheme that is either done with respect to the entire model or is applied layer-wise. On the theoretical front, we prove the convergence guarantee for the proposed approach in both convex and non-convex settings. We then experimentally demonstrate that our algorithms outperform the competitor methods by achieving faster convergence and recovering better quality and flatter local optima. We also carry out an ablation study to analyze the scalability of the proposed algorithms in more crowded distributed training environments. Finally, we report that our approach requires less frequent communication and fewer distributed updates compared to the state-of-the-art baselines.

V. Singh, A. Choromanska, S. Li, Y. Du, “Wake-Sleep Energy Based Models for Continual Learning,” in the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Continual Learning in Computer Vision, 2024 pdf

Abstract: This paper introduces a novel approach for continually training Energy-Based Models (EBMs) on the classification problems in the challenging setting of class incremental learning. Despite the fact that EBMs offer longer retention of knowledge on prior tasks, training EBMs contrastively remains a challenge. Driven by biological plausibility, we leverage the observation that sleep in humans supports active system consolidation and propose a new approach for training EBMs, which we call Wake-Sleep Energy Based Models (WS-EBMs), which rely on wake-sleep cycles. Our training approach consists of short wake phases followed by long sleep phases. During the short wake phase, the free energy associated with ground truth labels is minimized, which conditions the model towards the correct solutions. This is followed by a long sleep phase, where the free energy of the whole system is minimized contrastively, which allows the model to push the energy of incorrect solutions further from the correct response. We provide a theoretical analysis of WS-EBM showing that it satisfies the sufficient condition for designing proper EBM loss. Our empirical evaluation confirms the plausibility of our approach and demonstrates favorable performance of WS-EBM compared to traditional EBM training as well as state-of-the-art class-incremental continual learning techniques. Furthermore, our proposed two-phase training strategy can be easily integrated with existing techniques resulting in substantial boosts in their performance. Finally, we also provide interesting insights justifying our approach by analyzing the orthogonality between the sequential task vectors, and flatness of the optimized energy surfaces, which may guide the design of class incremental continual learning strategies.

Abstract: The goal of lifelong learning is to continuously learn from non-stationary distributions, where the non-stationarity is typically imposed by a sequence of distinct tasks. Prior works have mostly considered idealistic settings, where the identity of tasks is known at least at training. In this paper we focus on a fundamentally harder, so-called task-agnostic, setting where the task identities are not known and the learning machine needs to infer them from the observations. Our algorithm, which we call TAME (Task-Agnostic continual learning using Multiple Experts), automatically detects the shift in data distributions and switches between task expert networks in an online manner. At training, the strategy for switching between tasks hinges on an extremely simple observation that for each new coming task there occurs a statistically-significant deviation in the value of the loss function that marks the onset of this new task. At inference, the switching between experts is governed by the selector network that forwards the test sample to its relevant expert network. The selector network is trained on a small subset of data drawn uniformly at random. We control the growth of the task expert networks as well as selector network by employing pruning. Our experimental results show the efficacy of our approach on benchmark continual learning data sets, outperforming the previous task-agnostic methods and even the techniques that admit task identities at both training and testing, while at the same time using a comparable model size.

Abstract: This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods. The codes are publicly available at https://github.com/yunfei-teng/DCO.

Abstract: In this paper, we study the sharpness of a deep learning (DL) loss landscape around local minima in order to reveal systematic mechanisms underlying the generalization abilities of DL models. Our analysis is performed across varying network and optimizer hyper-parameters, and involves a rich family of different sharpness measures. We compare these measures and show that the low-pass filter based measure exhibits the highest correlation with the generalization abilities of DL models, has high robustness to both data and label noise, and furthermore can track the double descent behavior for neural networks. We next derive the optimization algorithm, relying on the low-pass filter (LPF), that actively searches the flat regions in the DL optimization landscape using SGD-like procedure. The update of the proposed algorithm, that we call LPF-SGD, is determined by the gradient of the convolution of the filter kernel with the loss function and can be efficiently computed using MC sampling. We empirically show that our algorithm achieves superior generalization performance compared to the common DL training strategies. On the theoretical front we prove that LPF-SGD converges to a better optimal point with smaller generalization error than SGD.

Students and Alumni

Prof. Choromanska's current PhD students:

Fatemeh

Fatemeh Naeinian
fn2174@nyu.edu
PhD candidate
School of Engineering Fellowship holder

Tolga

Tolga Dimlioglu
td2249@nyu.edu
PhD candidate
School of Engineering Fellowship holder
Summer Intern at Siemens, Summer 2023
Summer Intern at NVIDIA, Summer 2023 and Summer 2024

Haoran

Haoran Zhu
hz1922@nyu.edu
PhD candidate

Kristi

Kristi Topollai
kt2664@nyu.edu
PhD candidate
School of Engineering Fellowship holder

Prof. Choromanska's former PhD students:

Yunfei

Yunfei Teng
Employer after graduation:
School of Engineering Fellowship holder (PhD program)
Morse Fellowship holder (Master's program)
Theodor Tamir Awardee for the best MS thesis in Electrical and Computer Engineering
Summer Intern at Facebook Research New York: Summer 2022
Summer Intern at ByteDance (the parent company of TikTok), Summer 2021
Summer Intern at Facebook Research New York: Summer 2020
Summer Intern at IBM T. J. Watson Research Center: Summer 2019
Summer Intern at NVIDIA: Summer 2018

Jing

Jing Wang
Employer after graduation: Huawei Technologies Co., Ltd
Dean's Fellowship holder
Summer Intern at Recurrent AI (SaleTech company in China), Summer 2022
Summer Intern at Haihua Institute for Frontier Information Theory (research center jointly hosted by Institute for Interdisciplinary Information Sciences at Tsinghua University and Beijing Haidian District government), Summer 2021

Shihong

Shihong Fang
Employer after graduation: Nuro
Summer Intern at NVIDIA: Summer 2020

Maryam Majzoubi
Employer after graduation: Google (Google Lens team)
School of Engineering Fellowship holder
Summer Intern at Google Research New York: Summer 2020
Summer Intern at Microsoft Research New York: Summer 2019

Apoorva

Apoorva Nandini Saridena
Employer after graduation: NVIDIA (Holmdel, New Jersey location)
Summer Intern at NVIDIA: Spring 2021, Spring 2020, Summer 2019, Summer 2018

Devansh

Devansh Bisla
Employer after graduation: NVIDIA (Holmdel, New Jersey location)
School of Engineering Fellowship holder
Summer Intern at Microsoft: Summer 2021
Summer Intern at NVIDIA: Summer 2020
Summer Intern at Hearst: Summer 2018

Prof. Choromanska's former Master's students:

Harshal Kulkarni (thesis advising)
Haoze He (thesis advising)
Haoran Zhu (thesis advising)
Yunfei Teng (thesis advising)
Devansh Bisla (thesis advising)
Apoorva Nandini Saridena (thesis advising)
Shreya Kadambi (thesis advising)
Sachit Nagpal (thesis advising)
Jatin Palchuri (thesis advising)
Cameron Archibald Johnson (thesis advising)
Suchetha Siddagangappa
Karnik Panchal
Graph Thongwat
Twishikana Bhattacharjee
Yifan Yang
Yilu Peng
Rishabh Bahuguna
Arihant Jain
Vaibhav Singh
Zhenyuan Dong
Saumya Pandey
Prithvi Naidu

Graduate students that prof. Choromanska advised on selected projects:

Benjamin Cowen (PhD; advising on projects that became parts of his PhD thesis)
Naman Patel (PhD)
Jing Wang (Master's)
Ish Kumar Jain (Master's)
Arihant Jain (Master's)

Undergraduate students that prof. Choromanska advised on selected projects:

Cesar Lema (Undergraduate Senior Project)
Munib Mesinovic (Undergraduate Summer Research Program)

ECE Seminar Series on Modern Artifical Intelligence

Thumbnail

Prof. Choromanska founded the ECE Seminar Series on Modern Artifical Intelligence at NYU Tandon.

The series aims to bring together faculty, students, and researchers to discuss the most important trends in the world of AI, and the talks are live streamed and viewed around the globe, helping to spread the word about the amazing work going on in the AI community. The invited speakers are the world-renowned experts whose research is making an immense impact on the development of new machine learning techniques and technologies.

The seminar became the flagship venue of the NYU Tandon School of Engineering attracting broad audience from the industry (major tech companies as well as start-ups) and academia (universities from New York and New Jersey areas), and even high schools (Prof. Choromanska collaborates with Brooklyn Technical High School and all-girls Hewitt School, students from these schools attend the seminar). The talks are live-streamed and viewed around the entire world.

Playlist of all seminar talks is here.

Website of the seminar is here.

Prof. Choromanska also established the ECE Machine Learning Reading Group "Mambo with Machine Learning" at NYU Tandon.

List of past invited speakers: Hal Daume III, Augustin Chaintreau, Shipra Agrawal, Brian Kingsbury, Suman Jana, Jennifer Wortman Vaughan, Narges Razavian, Larry Jackel, Irina Rish, Robert Schapire, Alina Beygelzimer, Mariusz Bojarski, Krzysztof Choromanski

Broader Impacts

Prof. Choromanska and the members of her laboratory support the participation of women and underrepresented and minority groups in STEM fields and promote the participation of undergraduate and high-school students in STEM fields. Prof Choromanska is engaged in building a racially and ethnically diverse workforce in STEM fields. Prof. Choromanska is an active member of Women at Tandon and works in the committee for creating the Women's Center at NYU Tandon. Her students participate in the peer-to-peer mentoring program for women in STEM. The photo below shows Dean J. Kovacevic, prof. M. Veloso (the speaker in the ECE Seminar Series on Modern Artificial Intelligence at NYU Tandon), prof. Choromanska, and prof. I. Rish promoting women in STEM.

In July 2018 prof. Choromanska, together with her PhD student S. Fang and undergraduate student L. Nertomb, organized K12 ARISE Summer High School Program for 12 high school students titled "AI4AV: autonomous driving with deep learning models" offering 3-week training in the area of machine learning, deep learning optimization, and autonomous driving. Prof. Choromanska participated in K12 ARISE Summer High School Program also in the Summer 2019, Summer 2020, Summer 2021, Summer 2022, and Summer 2024, each time offering two-month research training for high-school students. Her PhD students, Kristi Topollai, Tolga Dimlioglu, Devansh Bisla, Apoorva Nandini Saridena, and Haoran Zhu were assisting her in organizing these programs. 16 high-school students in total were participating in the training programs from 2019 to 2024. Prof. Choromanska's programs are organized under a motto: "it takes a spark to ignite a fire'' and their goal is to motivate high school students to choose a career path in STEM. Selected photos from her programs are shown below.

ARISE1 ARISE2 Im1 group

LSL Sponsors

Prof. Choromanska and the LSL members are grateful to their research sponsors:

NSF
DARPA
Alfred P. Sloan Foundation
NVIDIA
NXP
Con Edison
BAE

Information for candidates

I am always looking for strong candidates for the PhD program in the areas of Deep Learning (with the emphasis on Optimization and Training Methods for Deep Learning Systems), Robotics (with the emphasis on Autonomous Driving), and general Machine Learning.

Research Centers, Labs, and Groups

Latest Stories

All News

Departments

Degrees & Programs

Resources

Overview

Community

News & Events

Share your thoughts to help us improve!