V2X-Sim - Collaborative Perception for Self-Driving in Urban Scenes | NYU Tandon School of Engineering

V2X-Sim - Collaborative Perception for Self-Driving in Urban Scenes

Health & Wellness


Project Sponsor:

 


Project Abstract

Vehicle-to-everything (V2X), which refers to collaboration between a vehicle and any entity in its vicinity via communication, might significantly increase perception in self-driving systems. Due to a lack of publicly available V2X datasets, collaborative perception has not progressed as quickly as single-agent perception. For this capstone project, we present V2X-Sim, the first public synthetic collaborative perception dataset in urban driving scenarios. The team will train, test and deploy computer vision (CV) and deep learning (DL) models for collaborative perception on V2X-Sim dataset.

Project Description & Overview

Vehicle-to-everything (V2X), which denotes the collaboration between a vehicle and other entities such as vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I), seeks to help self-driving vehicles see further, better and even see through occlusion, thereby fundamentally improving safety. According to the estimation of U.S. NHTSA, there would be a minimum of 13% reduction in traffic accidents if a V2V system were implemented, which means 439,000 fewer crashes every year.

The V2X-Sim project aims to provide lightweight collaborative perception technologies for use in urban driving scenarios. The main tasks include: (1) design high-performance and low-bandwidth multi-agent collaboration strategy in the high-dimensional feature space, (2) develop effective and efficient multimodal learning framework based on RGB image and LiDAR point cloud, and 3) improve the system robustness against communication latency and sensor noise.

The Capstone team will train, test and deploy CV/DL models for the collaborative perception tasks including multi-agent collaborative detection, tracking and segmentation. Existing V2X-Sim dataset will be used for training and evaluation. Data-driven multi-agent 3D scene understanding methods could also be explored.

The team will work through 3 stages:

  • High-dimensional feature-based collaboration model (30%)

    • The team will train and test a model built using V2X-Sim dataset.
  • Multimodal learning framework (40%)
    • The team will design a multimodal learning framework based on different sensory input.
  • Robustness investigation (30%)
    • The team will test the robustness of the model against realistic noise, and make improvements.

Alternatively, the team may choose to build a real-world V2X dataset, which is to be discussed with the PI.

Datasets

Given that building a collaborative perception dataset in the real world can be costly and laborious, we had built a virtual dataset to advance collaborative perception research. Specifically, we employ SUMO, a micro-traffic simulation, to produce numerically-realistic traffic flow, and CARLA, a widely-used open-source simulator for autonomous driving research, to retrieve the sensor streams from multiple vehicles located at the same intersection. Besides, we mount sensors on the traffic lights to empower the roadside to perceive the environment, and the sensor streams of both the vehicles and the roadside infrastructure are synchronized to ensure smooth collaboration. In addition, multi-modality sensor streams of different entities are recorded to enable cross-modality perception. Meanwhile, diverse annotations including bounding boxes, vehicle trajectories, and pixel-wise as well as point-wise semantics labels are provided to facilitate various downstream tasks.

Alternatively, if the team chooses to build a real-world V2X dataset, then we will collect real-world visual data (mainly images).

Competencies

  • Technical experience

    • Python programming (required for >=2 team members)

      • Good code and data management skills
    • Machine learning
      • Python SciPy stack and PyTorch DL library
      • Dimensionality reduction
      • Federated learning
      • Multimodal learning
    • Computer vision experience
    • Data processing pipelines
  • Documentation
    • Data management experience
    • Privacy and data
    • Ethics

Learning Outcomes & Deliverables

The team will be using a broad range of urban analytics approaches that will result in proven abilities in: computer vision, data science, and machine learning.

The expected deliverables for each project stage are:

  • A high-dimensional feature-based collaborative perception model trained on the provided V2X-Sim data.
  • A multimodal learning framework which supports both RGB image and LiDAR point cloud.
  • A report of the robustness investigation under different levels of realistic noise.

All deliverables will be committed to a well documented public GitHub repository.

Students

Zizhen Chen, Xiangkun Fang, Shicheng Jin, Yanzhou Zhu