Talking to Urban Data
Generative AI for Natural Language Query-Based Urban Data Analytics
- Stanislav Sobolevsky, Ph.D., Institute Professor, CUSP at NYU Tandon
Authors
Satyajit Sriram, Yuhan Zheng
Research Question
How can a generative AI-powered system enable non-technical users to accurately query complex urban data, while leveraging frameworks with spatio-temporal and network structures to enhance relevance, accuracy, and scalability across multiple cities?
Background
Urban data is very rich, but because of its complexity, it is often inaccessible to non-technical users. This project introduces a generative artificial intelligence framework that allows for natural language querying of spatio-temporal urban datasets. By combining advanced hint engineering, feedback-driven code fixing, and diverse data integration, it aims to provide accurate, scalable, and user-friendly urban analytics, and to bridge the gap between complex data and everyday decision-making.
Methodology
The system leverages generative AI models for natural language processing, translating user queries into executable analytic code. Queries are processed through Spatial-Temporal Transformer Networks (STTN), which returns results, code, and metadata. A semantic validation and feedback loop involves checking each query result for logical consistency, particularly geospatial and temporal nuances. Poorly written and misinterpreted queries are manually corrected, re-executed, and labeled by status, type, and format. The system was evaluated using real-world urban queries and the results demonstrate the model’s ability to interpret complex questions, retrieve relevant data, and provide user-friendly responses.
Deliverables
- Generative AI-powered System designed to interpret complex urban data, tested across multiple datasets, including employment, transit, and demographic data from NYC and nationwide sources. It delivers fast results with an average response time of under 10 seconds per query, even when handling larger datasets.
- Technical Report
Datasets
| Source | Dataset | Years |
|---|---|---|
| USCB Longitudinal Employer-Household Dynamics | Job-to-Job Flows (J2J) | 2000–2024 |
| USCB Longitudinal Employer-Household Dynamics | LEHD Origin-Destination Employment Statistics (LODES) | 2002–2022 |
| USCB Longitudinal Employer-Household Dynamics | Post-Secondary Employment Outcomes (PSEO) (Experimental) | 2024–2025 |
| USCB Longitudinal Employer-Household Dynamics | Quarterly Workforce Indicators (QWI) | |
| NYC Taxi & Limousine Commission | TLC Trip Record Data | 2009–2025 |