How Many Visual Levers Drive Urban Perception? Interventional Counterfactuals via Multiple Localised Edits

arXiv cs.CV / 4/27/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a key limitation of street-view perception models: they can predict subjective attributes like safety at scale, but they do not causally identify which localized visual edits would plausibly change human judgement for a given scene.
It proposes a “lever-based” interventional counterfactual framework that turns scene-level explainability into a constrained search over structured, localized counterfactual edits.
Each lever is defined by a semantic concept plus spatial support and an intervention direction, and candidate edits are generated via prompt-conditioned image editing while being filtered through validity checks (same-place preservation, locality, realism, and plausibility).
In a pilot study across 50 scenes from five cities, the method surfaces preliminary directional patterns and a failure taxonomy for prompt-only editing, with Mobility Infrastructure and Physical Maintenance producing the largest auxiliary safety shifts.
The authors note that human pairwise judgements will serve as the ground-truth endpoint for future validation of the counterfactual explanations.

Abstract

Street-view perception models predict subjective attributes such as safety at scale, but remain correlational: they do not identify which localized visual changes would plausibly shift human judgement for a specific scene. We propose a lever-based interventional counterfactual framework that recasts scene-level explainability as a bounded search over structured counterfactual edits. Each lever specifies a semantic concept, spatial support, intervention direction, and constrained edit template. Candidate edits are generated through prompt-conditioned image editing and retained only if they satisfy validity checks for same-place preservation, locality, realism, and plausibility. In a pilot across 50 scenes from five cities, the framework reveals preliminary proxy-based directional patterns and a practical failure taxonomy under prompt-only editing, with Mobility Infrastructure and Physical Maintenance showing the largest auxiliary safety shifts. Human pairwise judgements remain the ground-truth endpoint for future validation.

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research

Dev.to

I tested the same prompt across multiple AI models… the differences surprised me

Reddit r/artificial

The five loops between AI coding and AI engineering

Dev.to

How Many Visual Levers Drive Urban Perception? Interventional Counterfactuals via Multiple Localised Edits

Key Points

Abstract

Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared

Legal Insight Transformation: A Beginner's Guide to Modern Research

I tested the same prompt across multiple AI models… the differences surprised me

The five loops between AI coding and AI engineering

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer