ICAT: Incident-Case-Grounded Adaptive Testing for Physical-Risk Prediction in Embodied World Models

arXiv cs.RO / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Video生成のワールドモデルは身体性のある計画や方策学習で使われる一方、危険行動に伴う物理的リスクや重大結果の予測能力はこれまで十分に評価されていなかった。
研究では、主要なワールドモデルが危険の手がかりや重篤な結果を見落とし、想像ロールアウト上での計画・学習により不安全な嗜好を助長しうることを指摘した。
これに対し、実際のインシデント報告や安全マニュアルを根拠にリスク記憶を構造化し、因果連鎖と重大度ラベル付きのリスク事例を生成・制約するICAT手法を提案した。
ICATベースのベンチマーク実験では、多くの一般的なワールドモデルがメカニズムや引き金となる条件を取り違え、重大度の見積もりも不適切で、安全性が不可欠な身体性ロールアウトで必要とされる信頼性に届かないことが示された。

Abstract

Video-generative world models are increasingly used as neural simulators for embodied planning and policy learning, yet their ability to predict physical risk and severe consequences is rarely evaluated.We find that these models often downplay or omit key danger cues and severe outcomes for hazardous actions, which can induce unsafe preferences during planning and training on imagined rollouts. We propose ICAT, which grounds testing in real incident reports and safety manuals by building structured risk memories and retrieving/composing them to constrain the generation of risk cases with causal chains and severity labels. Experiments on an ICAT-based benchmark show that mainstream world models frequently miss mechanisms and triggering conditions and miscalibrate severity, falling short of the reliability required for safety-critical embodied deployment.

Every time a new model comes out, the old one is obsolete of course

Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

Reddit r/LocalLLaMA

ICAT: Incident-Case-Grounded Adaptive Testing for Physical-Risk Prediction in Embodied World Models

Key Points

Abstract

Related Articles

Every time a new model comes out, the old one is obsolete of course

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer