Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery

arXiv cs.CV / 4/24/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces a framework that uses multimodal LLMs with Google Street View imagery to automatically assess building conditions across the United States.
Fine-tuning Gemma 3 27B on a relatively small human-labeled dataset yields strong agreement with human mean opinion scores, surpassing individual raters on SRCC and PLCC versus the MOS benchmark.
To reduce latency and cost, the authors use knowledge distillation to compress the approach from Gemma 3 27B to a Gemma 3 4B model with roughly 3x faster performance while maintaining comparable accuracy.
They further distill the model into CNN- and transformer-based variants (EfficientNetV2-M and SwinV2-B), achieving near-original performance with about a 30x speed gain.
The work also evaluates LLMs on a broad set of built-environment and housing attributes via a human-AI alignment study and provides a visualization dashboard to support homeowners and downstream analysis.

Abstract

We present a novel framework for automatically evaluating building conditions nationwide in the United States by leveraging large language models (LLMs) and Google Street View (GSV) imagery. By fine-tuning Gemma 3 27B on a modest human-labeled dataset, our approach achieves strong alignment with human mean opinion scores (MOS), outperforming even individual raters on SRCC and PLCC relative to the MOS benchmark. To enhance efficiency, we apply knowledge distillation, transferring the capabilities of Gemma 3 27B to a smaller Gemma 3 4B model that achieves comparable performance with a 3x speedup. Further, we distill the knowledge into a CNN-based model (EfficientNetV2-M) and a transformer (SwinV2-B), delivering close performance while achieving a 30x speed gain. Furthermore, we investigate LLMs' capabilities for assessing an extensive list of built environment and housing attributes through a human-AI alignment study and develop a visualization dashboard that integrates LLM assessment outcomes for downstream analysis by homeowners. Our framework offers a flexible and efficient solution for large-scale building condition assessment, enabling high accuracy with minimal human labeling effort.

Black Hat USA

AI Business

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Dev.to

Context Engineering for Developers: A Practical Guide (2026)

Dev.to

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

Dev.to

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

Dev.to

Leveraging Multimodal LLMs for Built Environment and Housing Attribute Assessment from Street-View Imagery

Key Points

Abstract

Related Articles

Black Hat USA

The 67th Attempt: When Your "Knowledge Management" System Becomes a Self-Fulfilling Prophecy of Excellence

Context Engineering for Developers: A Practical Guide (2026)

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.

AI Visibility Tracking Exploded in 2026: 6 Tools Every Brand Needs Now

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer