Behavior Modeling Using Multi-Modal Mobility Data | NYU Tandon School of Engineering

Behavior Modeling Using Multi-Modal Mobility Data

Transportation & Infrastructure,
Urban


Project Sponsor:

 


Project Abstract

Develop and demonstrate methods for analyzing patterns found in mobility data, such as the aggregate tracks of vehicle populations. An expanding collection of research indicates that information about the movements of populations (i.e., syntactic trajectories) provide informative patterns. This project will explore how multiple sets of data can be jointly analyzed.


Project Description & Overview

1. The primary data corpus will be the traces for taxicabs in New York City.

2. Secondary sources of data include the following: (a) dates of movable holidays, e.g., Easter; (b) daily weather data e.g., temperature highs and lows, precipitation, and wind storm conditions (see examples [1] [2]).

3. Prospective ancillary (tertiary) data sources could further include arrivals and departures at one (or more) major transportation hub(s):

  • Cruise line and ferry terminals
  • Airport(s) (e.g., La Guardia Airport, JFK Airport, Newark Airport)
  • One or (more) train station(s) (e.g., Penn Station or Grand Central Station)
  • Sports/performance venues (e.g., Meadowlands, Yankee and Shea stadiums, Barclay Center, Prudential Center) and convention centers (e.g. Madison Square Garden, Javitz Center);
    • Corresponding schedules for major sporting events, performances, and conventions or trade shows.

These data will be employed to investigate patterns across syntactic traces/trajectories in order to perform exploratory data analysis and unsupervised learning related to the following questions:

  • What is the correspondence between the volume of taxi departures from a particular transit hub and the arrival times of trains or planes?
  • What diurnal patterns are evident?
  • How does the level of activity vary over the week (e.g., weekdays, weekends, holidays)?
  • What are differences exhibited due to weather conditions?
  • What patterns are exhibited when comparing the trace data to the schedules of events? For example, is the destination of someone arriving by train more likely to be a sports/performance venue or a convention?
    • Do differences correspond to train arrival terminals, holiday date(s), time(s) of arrival (diurnal/nocturnal), or weather conditions (e.g. temperature, precipitation)?
  • Are transit disruptions (e.g., flight cancellations, rail delays) detectable by analyzing the taxi activity?

Another adviser to this project is John Irvine, Department Manager for Civil Defense at MITRE Corporation, and affiliated with RiskEcon® Lab as a Senior Science Advisor-in-Residence with a PhD (Yale Mathematical Statistics), as well as Adjunct Professor appointments on the Health Faculty at Queensland University of Technology and the Institute for Glycomics at Griffith University. Previous to MITRE, he was the Chief Scientist for Data Analytics at The Charles Stark Draper Laboratory, Inc. With 40 years of professional experience, he has led numerous projects in remote sensing, and served on multiple boards and advisory panels, and is active in the research community with over 200 journal and conference publications.

*References:


Datasets

The primary data corpus will be the traces for taxicabs in New York City. Secondary sources of data include the following: (a) dates of movable holidays, e.g., Easter; (b) daily weather data e.g., temperature highs and lows, precipitation, and wind storm conditions (see examples [1] [2]).


Competencies

  • Reasonable proficiency with statistical applications in NumPy, SciPy and/or R
  • Basic familiarity with exploratory data analysis, clustering, anomaly detection, outlier analysis is helpful

Learning Outcomes & Deliverables

  1. Problem-solving and experimental design skills with real-world application.
  2. Domain-specific application of statistical reasoning and hypothesis testing.

Students

Pupul Bhoumick, Vittorio Costa, Flora Gong, Junren Wang