TSHA: A Benchmark for Visual Language Models in Trustworthy Safety Hazard Assessment Scenarios
arXiv cs.CV / 4/1/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper proposes TSHA (Trustworthy Safety Hazards Assessment), a new benchmark for evaluating vision-language models (VLMs) on indoor safety hazard assessment scenarios.
- It addresses limitations of prior benchmarks by reducing the synthetic-to-real domain gap, expanding safety tasks beyond oversimplified constraints, and introducing more rigorous evaluation protocols.
- TSHA includes 81,809 curated training samples sourced from existing indoor datasets, internet images, AIGC images, and newly captured images to better reflect real environments.
- The benchmark’s challenging test set (1,707 samples) contains videos and panoramic images with multiple simultaneous hazards to measure robustness in complex home safety contexts.
- Experiments across 23 VLMs show current models perform poorly on safety hazard assessment, while training on TSHA improves results by up to +18.3 points and boosts generalizability on other benchmarks.
Related Articles

Black Hat Asia
AI Business

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to

Day 6: I Stopped Writing Articles and Started Hunting Bounties
Dev.to

Early Detection of Breast Cancer using SVM Classifier Technique
Dev.to

I Started Writing for Others. It Changed How I Learn.
Dev.to