Vision Language Modeling (VLM) for Urban Climate

Transportation & Infrastructure,

Urban

Authors

Yanqing Chen, Yushan Li, Sneha Tirchy Shekar, Jiayi Weng

Research Question

Can Vision-Language Models (VLMs) accurately estimate climate-related indicators (e.g., humidity) from street-view images?

Background

This project aims to address several fundamental challenges for delivering authentic and accurate information with LLM-based analysis.

Methodology

Street-level images were retrieved from Mapillary and Google Street View APIs using a grid-based approach to ensure uniform spatial coverage. Metadata, including latitude, longitude, and capture dates, were scraped alongside weather information (temperature and humidity) from the Visual Crossing Weather API. Timestamps were converted to local time, duplicate records were discarded, and image-weather pairs of data were aligned for consistency and accuracy. The Google Vision API was used to classify objects and urban scenes in the images for additional contextual understanding.

For analytical modeling, two open-source VLMs—Qwen2.5-VL and InternVL-2.5—are deployed using Python libraries such as Transformers and PyTorch, along with the LMDeploy toolkit for the inference pipeline. High-performance computing (HPC) resources at NYU, including singularity containers and GPU processing, were used for higher efficiency in execution. The models are configured to process image pairs, compare climate indicators, and generate output-structured results. For performance validation, model predictions are compared to ground truth weather and human evaluation, along with improvements over approaches utilizing prompting techniques, and Low-Rank Adaptation (LoRA) fine-tuning. Post-processing entails the application of psychometric models like the Bradley-Terry model or Thurstone scaling in trying to infer continuous climate indicator estimates from pairwise comparisons.

Deliverables

Cross-platform App delivering data-driven insights on climate metrics, including UV index, humidity, temperature, and heat analysis from image data as well as providing prompt engineering suggestions and supplementary research resources, such as maps, for deeper understanding
Technical Report

Datasets

Source	Dataset	Years
Google	Street View Static API: Metadata including latitude and longitude for location tracking and capture date for linking images to weather conditions.	2007 – Present
Mapillary	Metadata including latitude and longitude for location tracking and capture date for linking images to weather conditions.	2013 – Present
Visual Crossing Weather API	Historical Weather (Temperature and Humidity)	1901 – Present

Departments

Degrees & Programs

Resources

Overview

Community

News & Events