LLM-Powered Flood Depth Estimation from Social Media Imagery: A Vision-Language Model Framework with Mechanistic Interpretability for Transportation Resilience
arXiv cs.CV / 3/19/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- FloodLlama is a fine-tuned open-source vision-language model for real-time, centimeter-resolution flood-depth estimation from single street-level images, supported by a multimodal TikTok data pipeline.
- The model was trained on a synthetic dataset of about 190,000 images spanning seven vehicle types, four weather conditions, and 41 depth levels (0-40 cm at 1 cm resolution) using progressive curriculum learning and QLoRA to fine-tune LLaMA 3.2-11B Vision.
- Evaluation across 34,797 trials shows depth-dependent prompt effects, with simple prompts excelling at shallow depths and chain-of-thought reasoning improving performance at greater depths; MAE is below 0.97 cm and Acc@5cm exceeds 93.7% for deep flooding.
- A five-phase mechanistic interpretability framework identifies layer L23 as the critical depth-encoding transition and enables selective fine-tuning that reduces trainable parameters by 76-80% while maintaining accuracy.
- The Tier 3 configuration achieves 98.62% real-world accuracy and demonstrates robustness under occlusion, validated on 676 flood frames from Detroit to show real-time, crowd-sourced feasibility.
Related Articles
State of MCP Security 2026: We Scanned 15,923 AI Tools. Here's What We Found.
Dev.to
I Built a Zombie Process Killer Because Claude Code Ate 14GB of My RAM
Dev.to
Data Augmentation Using GANs
Dev.to
Building Safety Guardrails for LLM Customer Service That Actually Work in Production
Dev.to

The New AI Agent Primitive: Why Policy Needs Its Own Language (And Why YAML and Rego Fall Short)
Dev.to