Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models

arXiv cs.CL / 5/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper investigates whether LLMs’ core reasoning abilities (induction, deduction, abduction) can be decoupled from specific problem instances to improve controllability.
Using “reasoning conflicts,” where models are forced to follow logical schemata that deviate from what the target task expects, the study finds LLMs consistently prefer sensible reasoning over blindly following compliant instructions.
It shows that task accuracy often remains high even under conflicting schemata, implying reliance on internalized parametric memory that grows stronger with larger model size.
The authors demonstrate that reasoning conflicts are internally detectable via confidence drops, and probing suggests reasoning types are linearly encoded in mid-to-late layers, enabling activation-level controllability.
By applying mechanistic steering to promote compliance, the authors increase instruction following by up to 29%, supporting improved faithfulness and generalizability through decoupling logical schemata from data.

Abstract

Large Language Models (LLMs) are known to acquire reasoning capabilities through shared inference patterns in pre-training data, which are further elicited via Chain-of-Thought (CoT) practices. However, whether fundamental reasoning patterns, such as induction, deduction, and abduction, can be decoupled from specific problem instances remains a critical challenge for model controllability, and for shedding light on reasoning controllability. In this paper, we present the first systematic investigation of this problem through the lens of reasoning conflicts: an explicit tension between parametric and contextual information induced by mandating logical schemata that deviate from those expected for a target task. Our evaluation reveals that LLMs consistently prioritize sensibility over compliance, favoring task-appropriate reasoning patterns despite conflicting instructions. Notably, task accuracy is not strictly determined by sensibility, with models often maintaining high performance even when using conflicting patterns, suggesting a reliance on internalized parametric memory that increases with model size. We further demonstrate that reasoning conflicts are internally detectable, as confidence scores significantly drop during conflicting episodes. Probing experiments confirm that reasoning types are linearly encoded from middle-to-late layers, indicating the potential for activation-level controllability. Leveraging these insights, we steer models towards compliance, increasing instruction following by up to 29%. Overall, our findings establish that while LLM reasoning is anchored to concrete instances, active mechanistic interventions can effectively decouple logical schemata from data, offering a path toward improved controllability, faithfulness, and generalizability.