LLM-Powered Flood Depth Estimation from Social Media Imagery: A Vision-Language Model Framework with Mechanistic Interpretability for Transportation Resilience
arXiv cs.CV / 3/19/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- FloodLlama is a fine-tuned open-source vision-language model for real-time, centimeter-resolution flood-depth estimation from single street-level images, supported by a multimodal TikTok data pipeline.
- The model was trained on a synthetic dataset of about 190,000 images spanning seven vehicle types, four weather conditions, and 41 depth levels (0-40 cm at 1 cm resolution) using progressive curriculum learning and QLoRA to fine-tune LLaMA 3.2-11B Vision.
- Evaluation across 34,797 trials shows depth-dependent prompt effects, with simple prompts excelling at shallow depths and chain-of-thought reasoning improving performance at greater depths; MAE is below 0.97 cm and Acc@5cm exceeds 93.7% for deep flooding.
- A five-phase mechanistic interpretability framework identifies layer L23 as the critical depth-encoding transition and enables selective fine-tuning that reduces trainable parameters by 76-80% while maintaining accuracy.
- The Tier 3 configuration achieves 98.62% real-world accuracy and demonstrates robustness under occlusion, validated on 676 flood frames from Detroit to show real-time, crowd-sourced feasibility.
Related Articles

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Hey dev.to community – sharing my journey with Prompt Builder, Insta Posts, and practical SEO
Dev.to

Why Regex is Not Enough: Building a Deterministic "Sudo" Layer for AI Agents
Dev.to

Perplexity Hub
Dev.to

How to Build Passive Income with AI in 2026: A Developer's Practical Guide
Dev.to