Estimating Citywide Cycling Volumes Using Crowdsourced and Count Data | NYU Tandon School of Engineering

Estimating Citywide Cycling Volumes Using Crowdsourced and Count Data

Transportation & Infrastructure,
Urban


Project Sponsor:

Mojdeh Azad, Economic Data Scientist at New York City Department of Transportation
 

MENTOR:

Rishabh S. Chauhan, Industry Assistant Professor at CUSP, NYU Tandon


Authors

Sichang Yu, Lucia Shi


Research Question

How can we use biased but broad-coverage crowdsourced cycling data, alongside limited official counts and contextual features, to estimate cycling volumes across the city network?


Background

The Economics team within the Policy Unit at the Commissioner's Office of NYCDOT, conducts economic analysis for a wide range of proposed projects and programs, including major street redesigns, the fee structure for outdoor dining, new bridge preservation techniques, and overweight truck regulation. The team also manages DOT's Citywide Mobility Survey. The Policy unit also plays a key role in generating new program ideas and has worked closely with other NYC DOT units to develop and implement agency projects.

This project aims to estimate cycling volumes by integrating city bike count data with crowdsourced Strava cycling data. Using machine learning and geospatial analysis, the team will identify biases, correct for them in crowdsourced data, and generate estimates of cycling activity on street segments. The resulting insights will support more data-driven transportation planning.


Methodology

Accurate data on cycling volumes is essential for planning infrastructure within the city. However, DOT typically relies on a limited number of bike counts, which offer reliable but sparse data. Meanwhile, Strava provides a rich dataset of crowdsourced cycling activity, though it is known to overrepresent certain user types and trip purposes. In addition, their data manipulation might add some other limitations to the final counts. This project will address that gap by developing a data fusion and modeling framework to estimate cycling volumes across NYC street segments. Students will compare Strava data with bike count data, build machine learning models that adjust for demographic and geographic biases, and generate estimates of cycling activity on road segments. The methodology will include spatial feature engineering, predictive modeling, and validation using spatial cross-validation techniques. Final deliverables may include maps of estimated cycling volumes, model evaluation reports, and policy-relevant recommendations.


Deliverables
  • A predictive model that estimates cycling volumes on road segments from Strava and count data
  • Interactive maps or dashboards showing estimated network-wide cycling volumes
  • A report summarizing findings, bias analysis, methodology, and policy implications

Data Sources
  1. Bike count data (NYCDOT)
  2. Strava Metro data
  3. Street network and land use (OpenStreetMap)
  4. Socio-demographic data (US Census)
  5. Weather (Open weather APIs)