Estimating Citywide Cycling Volumes Using Crowdsourced and Count Data
Mojdeh Azad, Economic Data Scientist, NYC Department of Transportation
MENTOR:
Rishabh S. Chauhan, Industry Assistant Professor at NYU Tandon's Center for Urban Science + Progress
Authors
Sichang Yu, Lucia Shi
Research Question
How can biased but broad-coverage crowdsourced cycling data, alongside limited official counts and contextual features, be used to estimate cycling volumes across the city network?
Background
This project aims to estimate cycling volumes by integrating city bike count data with crowdsourced Strava cycling data. Using machine learning and geospatial analysis, the team will identify biases, correct for them in crowdsourced data, and generate estimates of cycling activity on street segments. The results will support data-driven transportation planning.
Urban cycling data is often incomplete or spatially limited, hindering planning decisions. Inspired by recent academic methods that fuse sparse sensor data with crowdsourced sources, this project will replicate and adapt that approach for cycling. Students will align Strava and counter data spatially, engineer relevant features (e.g., bike lanes, other road characteristics, land use, etc.), and train predictive models to estimate bike ride volumes on city road segments. Spatial cross-validation will be used to evaluate performance.
Methodology
Accurate cycling volume data is essential for planning infrastructure within the city. However, the New York City Department of Transportation (DOT) typically relies on a limited number of bike counts, which offer reliable but sparse data. Meanwhile, Strava provides a rich dataset of crowdsourced cycling activity, though it is known to overrepresent certain user types and trip purposes. In addition, their data manipulation might add some other limitations to the final counts. This project addresses that gap by developing a data fusion and modeling framework to estimate cycling volumes across NYC street segments. The team is comparing Strava data with bike count data, building machine learning models that adjust for demographic and geographic biases, and generating estimates of cycling activity on road segments. The methodology includes spatial feature engineering, predictive modeling, and validation using spatial cross-validation techniques.
Deliverables
- A predictive model that estimates cycling volumes on road segments from Strava and count data
- Interactive maps or dashboards showing estimated network-wide cycling volumes
- A report summarizing findings, bias analysis, methodology, and policy implications