Building Accessible City by Self-Supervised Visual Place Recognition

Transportation & Infrastructure,

Urban

Project Abstract

Visual Place Recognition (VPR), which aims to identify previously visited places during navigation. In the context of SLAM, VPR is a key component in relocalization when the tracking is lost. Existing learning-based visual place recognition methods are generally supervised and require extra sensors (GPS or wheel encoder) to provide ground truth location labels. Differently, we want to design a self-supervised method for visual place recognition which can smoothly recognize the visited locations in a single scene environment without any ground truth labels. The method should be able to handle a variety of input modalities, including point clouds and RGB pictures.

Project Description & Overview

Visual place recognition (VPR), aiming to identify the re-visited places based on visual information, is a well-known challenging problem in computer vision and robotics communities because of visual appearance variation. VPR is crucial in autonomous navigation systems, and is closely related to the concepts of re-localization, loop closure detection, and image retrieval.

Most of the state-of-art methods are supervised, which requires geographical location information in the training dataset. For most indoor scenarios, geospatial information, like GPS, is not obtainable for supervised training. And if GPS can be retrieved, the need for visual place recognition will be less essential. This would reach a paradoxical situation.

The team will work through three stages:

General Self-Supervised model and Supervised Model as Baseline (40%)
- Supervised Model: The team will implement a Supervised method
- Self-Supervised model: The team will modify the current model to an Self-supervised model version purely dependent on the temporal information
Literature review on visual place recognition (10%)
- The team will complete a review on visual place recognition
Design and construct a suitable model for image and point cloud dataset (50%)
- Create a large scene Point Cloud dataset based on template given
- Create a habitatsim environment for Image dataset using a 360 degree RGBD camera
- Design a new suitable framework to fill the performance gap between supervised and self-supervised methods

Datasets

Image dataset from previous year: https://ai4ce.github.io/NYU-VPR/
Existing point cloud dataset can be used for initial testing. After the new model is developed and deployed, we should deploy the model on 2D real 360 degree point cloud dataset and 2D simulated habitatsim environment. We could also collect image dataset ourselves.

Competencies

Machine learning
- Contrastive learning
- Supervised learning
- Self/Weak-supervised learning
- Computer vision experience
- Feature learning
Python SciPy stack and PyTorch DL library
- SLAM
- Topology Mapping
- Self-supervised localization
Technical experience
- Python programming (required for >=2 team members)
- Machine learning
- Data processing pipelines
- Documentation
Dataset experience
- Create simulated dataset
- Play with real dataset and check how real dataset differs from simulated dataset

Learning Outcomes & Deliverables

The expected deliverables for each project stage are:

An self-supervised model that operates with a given minimum performance level on the provided test data.
A literature review on visual place recognition, contrastive learning, and feature learning representations.

All deliverables will be based around Jupyter notebooks and committed to a well documented public GitHub repository.

Students

Chien Hsu, Gang Jiang, Chenhao Jin, Fanshu Li

Departments

Degrees & Programs

Resources

Overview

Community

News & Events