Audio-Visual Vehicle Localization for Urban Traffic Monitoring | NYU Tandon School of Engineering

Audio-Visual Vehicle Localization for Urban Traffic Monitoring

Transportation & Infrastructure,
Urban


Project Sponsor:

Project Abstract

Monitoring road traffic is key to ensuring user safety and smooth operation. Increasing traffic volumes impact the stress level of commuters and increase noise levels in communities, leading to health problems. Local authorities need reliable monitoring systems to create policies to help mitigate this. Ideally, automatic monitoring systems should be able not only to count vehicles but also to detect the type of vehicle (e.g. car, truck). In this project we aim to develop a system for classification of vehicles that delivers audio-visual data for the robust localization of vehicles in the wild.


Project Description & Overview

This project investigates the use of audio-visual self-supervised deep learning models for the location of vehicles in urban settings, as a step towards building efficient urban mobility systems. Instead of using labelled data, self-supervised models learn by identifying intrinsic characteristics of the data, which they use to accomplish a given task. These models can be trained on unlabelled recordings and images of natural scenes, for which there is an abundance (e.g. YouTube videos), and they tend to outperform supervised models in practice.

This project consists of three stages:

  1. A stage of data analysis from a well-curated dataset of audio-visual urban data to get familiar with the data and the problem;
  2. Adaptation of a state-of-the-art self-supervised audio-visual model to work with this data;
  3. Analysis, evaluation and visualization of results and document writing.

We will use data and code resources from previous work within our team. The goal of the project is to answer the questions: How well can we localize vehicles in urban settings with self-supervised models? Which conditions (e.g. poor lighting or noisy environments) affect the performance of these systems the most?

This project is a continuation of a previous Capstone project, which students can review for examples of the type of things this team will be doing.


Datasets

We will use a dataset of audiovisual road traffic monitoring from the MARL team.


Competencies

The students should be comfortable with Python and familiar with data analysis tools such as pandas or seaborn packages. Having a machine learning background is also desirable (basic classification models such as random forests and test/train splits for evaluation).


Learning Outcomes & Deliverables

To conduct such a project we need audio-visual annotated data to train and assess the performance of the system. We will use our team’s data for that, and a first deliverable would be an analysis of the dataset, its challenges and a definition of subsets of the data to address the problem at different levels of difficulty. Secondly, the students will get familiar and adapt a state-of-the-art self-supervised model (which code is publicly available) to work on our data. Such model was trained with large amounts of data and proved to work successfully in many cases, but has not been tested in urban data yet. So the second deliverable would be the code adaptation of this model to work with new data and a small technical report of the changes needed. Finally, the students will use the model to localize vehicles in urban settings and perform an ablation study on the impact of different conditions on the performance of the system. The final deliverable would be a report with a summary of the work carried out, and visualizations of the predictions that the model produced after its adaptation, and main conclusions along with the associated code used.


Students

Yuehang Chen, Jacob Jiang, Suraj Sunil, Keren Zhang