Vision Language Modeling (VLM) for Urban Climate | NYU Tandon School of Engineering

Vision Language Modeling (VLM) for Urban Climate

Transportation & Infrastructure,
Urban


Project Sponsor:
  • Qi Sun, Assistant Professor, Department of Computer Science and Engineering, Center for Urban Science + Progress at NYU Tandon

Authors

Yanqing Chen, Yushan Li, Sneha Tirchy Shekar, Jiayi Weng




Research Question

Can Vision-Language Models (VLMs) accurately estimate climate-related indicators (e.g., humidity) from street-view images?


Background

This project aims to address several fundamental challenges for delivering authentic and accurate information with LLM-based analysis.




Methodology

Street-level images were retrieved from Mapillary and Google Street View APIs using a grid-based approach to ensure uniform spatial coverage. Metadata, including latitude, longitude, and capture dates, were scraped alongside weather information (temperature and humidity) from the Visual Crossing Weather API. Timestamps were converted to local time, duplicate records were discarded, and image-weather pairs of data were aligned for consistency and accuracy. The Google Vision API was used to classify objects and urban scenes in the images for additional contextual understanding.

 

For analytical modeling, two open-source VLMs—Qwen2.5-VL and InternVL-2.5—are deployed using Python libraries such as Transformers and PyTorch, along with the LMDeploy toolkit for the inference pipeline. High-performance computing (HPC) resources at NYU, including singularity containers and GPU processing, were used for higher efficiency in execution. The models are configured to process image pairs, compare climate indicators, and generate output-structured results. For performance validation, model predictions are compared to ground truth weather and human evaluation, along with improvements over approaches utilizing prompting techniques, and Low-Rank Adaptation (LoRA) fine-tuning. Post-processing entails the application of psychometric models like the Bradley-Terry model or Thurstone scaling in trying to infer continuous climate indicator estimates from pairwise comparisons.




Deliverables
  • Cross-platform App delivering data-driven insights on climate metrics, including UV index, humidity, temperature, and heat analysis from image data as well as providing prompt engineering suggestions and supplementary research resources, such as maps, for deeper understanding
  • Technical Report



Datasets

 

Source Dataset Years
Google Street View Static API: Metadata including latitude and longitude for location tracking and capture date for linking images to weather conditions. 2007 – Present
Mapillary Metadata including latitude and longitude for location tracking and capture date for linking images to weather conditions. 2013 – Present
Visual Crossing Weather API Historical Weather (Temperature and Humidity) 1901 – Present