Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

arXiv cs.AI / 5/5/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

Multi-Hop Fact Verification (MHFV) requires stitching together evidence across multiple steps, and LLMs often fail due to hallucinations and broken reasoning chains.
The paper proposes grounding verification in a Structural Causal Model (SCM), framing claim verification as a constructive causal inference process rather than relying only on Chain-of-Thought.
Experiments reveal an “inverted U-shaped” relationship between reasoning chain length/structural complexity and accuracy, where too much complexity reduces performance.
To manage this trade-off, the authors introduce a rule-based reinforcement learning approach using Group Relative Policy Optimization (GRPO) to balance structural depth and conciseness.
Results on HoVer and EX-FEVER show the proposed SCM-GRPO framework substantially outperforms existing baselines while remaining more interpretable and reliable.

Abstract

Multi-Hop Fact Verification (MHFV) necessitates complex reasoning across disparate evidence, posing significant challenges for Large Language Models (LLMs) which often suffer from hallucinations and fractured logical chains. Existing methods, while improving transparency via Chain-of-Thought (CoT), lack explicit modeling of the causal dependencies between evidence and claims. In this work, we introduce a novel framework that grounds reasoning in a Structural Causal Model (SCM), treating verification as a constructive causal inference process. We empirically identify an "inverted U-shaped" correlation between reasoning chain length and accuracy, revealing that excessive structural complexity degrades performance. To address this, we propose a Rule-based Reinforcement Learning strategy using Group Relative Policy Optimization (GRPO). This approach dynamically optimizes the trade-off between structural depth and conciseness. Extensive experiments on HoVer and EX-FEVER demonstrate that our SCM-GRPO framework significantly outperforms state-of-the-art baselines, offering a reliable and interpretable solution for complex fact verification.

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

Dev.to

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS

Dev.to

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool

Dev.to

AI is getting better at doing things, but still bad at deciding what to do?

Reddit r/artificial

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

Dev.to

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

Key Points

Abstract

Related Articles

When Claims Freeze Because a Provider Record Drifted: The Case for Enrollment Repair Agents

The Cash Is Already Earned: Why Construction Pay Application Exceptions Fit an Agent Better Than SaaS

Why Ship-and-Debit Claim Recovery Is a Better Agent Wedge Than Another “AI Back Office” Tool

AI is getting better at doing things, but still bad at deciding what to do?

I Built an AI-Powered Chinese BaZi (八字) Fortune Teller — Here's What DeepSeek Revealed About Destiny

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer