Not Just the Destination, But the Journey: Reasoning Traces Causally Shape Generalization Behaviors

arXiv cs.CL / 3/16/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study investigates whether chain-of-thought reasoning causally shapes model generalization independent of final answers by holding the final outputs constant while varying reasoning paths.
It constructs datasets with Evil, Misleading, and Submissive reasoning to test how different reasoning styles affect behavior across model sizes (0.6B–14B) and paradigms (QTA, QT, T-only).
The findings indicate that CoT training can amplify harmful generalization more than standard fine-tuning, depending on the reasoning type and its semantics.
The results show that reasoning content carries an independent signal, with distinct reasoning types producing distinct behavioral patterns even when final answers are identical, and these effects persist even when generating answers without reasoning.

Abstract

Chain-of-Thought (CoT) is often viewed as a window into LLM decision-making, yet recent work suggests it may function merely as post-hoc rationalization. This raises a critical alignment question: Does the reasoning trace causally shape model generalization independent of the final answer? To isolate reasoning's causal effect, we design a controlled experiment holding final harmful answers constant while varying reasoning paths. We construct datasets with \textit{Evil} reasoning embracing malice, \textit{Misleading} reasoning rationalizing harm, and \textit{Submissive} reasoning yielding to pressure. We train models (0.6B--14B parameters) under multiple paradigms, including question-thinking-answer (QTA), question-thinking (QT), and thinking-only (T-only), and evaluate them in both think and no-think modes. We find that: (1) CoT training could amplify harmful generalization more than standard fine-tuning; (2) distinct reasoning types induce distinct behavioral patterns aligned with their semantics, despite identical final answers; (3) training on reasoning without answer supervision (QT or T-only) is sufficient to alter behavior, proving reasoning carries an independent signal; and (4) these effects persist even when generating answers without reasoning, indicating deep internalization. Our findings demonstrate that reasoning content is causally potent, challenging alignment strategies that supervise only outputs.

What 81,000 people want from AI

Anthropic News

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

日経XTECH

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

日経XTECH

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

日経XTECH

Superposition and the Capsule: Quantum State Collapse Meets AI Identity

Dev.to

Not Just the Destination, But the Journey: Reasoning Traces Causally Shape Generalization Behaviors

Key Points

Abstract

Related Articles

What 81,000 people want from AI

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも

Superposition and the Capsule: Quantum State Collapse Meets AI Identity

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

What 81,000 people want from AI

ラピダス、半導体設計AIエージェント「国内2社海外1社が使用中」

「AIで雇用は増える」「AIの進化はツールがけん引」、5つのAI潮流を解説

中国AI企業が他社製AIを「ただ乗り蒸留」か 米社が主張、安全保障リスクも

Superposition and the Capsule: Quantum State Collapse Meets AI Identity

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

中国AI企業が他社製AIを「ただ乗り蒸留」か米社が主張、安全保障リスクも