Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

arXiv cs.AI / 3/18/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The authors present a causal evaluation protocol to determine whether intermediate structures in schema-guided LLM reasoning causally determine final outputs.
In experiments across eight models and three benchmarks, models are self-consistent with their intermediate structures but often fail to update predictions after interventions, up to 60% of cases, revealing fragility of apparent faithfulness.
When the final decision is derived from an external tool, this fragility largely disappears, suggesting the structure can influence but not reliably mediate the outcome.
Prompts that emphasize the intermediate structure over the original input do not materially close the gap, indicating intermediate structures act as influential context rather than stable causal mediators.

Abstract

Schema-guided reasoning pipelines ask LLMs to produce explicit intermediate structures -- rubrics, checklists, verification queries -- before committing to a final decision. But do these structures causally determine the output, or merely accompany it? We introduce a causal evaluation protocol that makes this directly measurable: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across eight models and three benchmarks, models appear self-consistent with their own intermediate structures but fail to update predictions after intervention in up to 60% of cases -- revealing that apparent faithfulness is fragile once the intermediate structure changes. When derivation of the final decision from the structure is delegated to an external tool, this fragility largely disappears; however, prompts which ask to prioritize the intermediate structure over the original input do not materially close the gap. Overall, intermediate structures in schema-guided pipelines function as influential context rather than stable causal mediators.

Astral to Join OpenAI

Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Reddit r/LocalLLaMA

Why Data is Important for LLM

Dev.to

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

Dev.to

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

Dev.to

Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

Key Points

Abstract

Related Articles

Astral to Join OpenAI

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.

Why Data is Important for LLM

The Inference Market Is Consolidating. Agent Payments Are Still Nobody's Problem.

YouTube's Deepfake Shield for Politicians Changes Evidence Forever

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer