How Many Visual Levers Drive Urban Perception? Interventional Counterfactuals via Multiple Localised Edits
arXiv cs.CV / 4/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses a key limitation of street-view perception models: they can predict subjective attributes like safety at scale, but they do not causally identify which localized visual edits would plausibly change human judgement for a given scene.
- It proposes a “lever-based” interventional counterfactual framework that turns scene-level explainability into a constrained search over structured, localized counterfactual edits.
- Each lever is defined by a semantic concept plus spatial support and an intervention direction, and candidate edits are generated via prompt-conditioned image editing while being filtered through validity checks (same-place preservation, locality, realism, and plausibility).
- In a pilot study across 50 scenes from five cities, the method surfaces preliminary directional patterns and a failure taxonomy for prompt-only editing, with Mobility Infrastructure and Physical Maintenance producing the largest auxiliary safety shifts.
- The authors note that human pairwise judgements will serve as the ground-truth endpoint for future validation of the counterfactual explanations.
Related Articles

Legal Insight Transformation: 7 Mistakes to Avoid When Adopting AI Tools
Dev.to

Legal Insight Transformation: Traditional vs. AI-Driven Research Compared
Dev.to

Legal Insight Transformation: A Beginner's Guide to Modern Research
Dev.to
I tested the same prompt across multiple AI models… the differences surprised me
Reddit r/artificial

The five loops between AI coding and AI engineering
Dev.to