From Exposure to Internalization: Dual-Stream Calibration for In-context Clinical Reasoning

arXiv cs.AI / 4/10/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that existing in-context learning and RAG approaches often expose models to clinical knowledge but do not achieve true “contextual internalization” that adjusts internal representations per case at inference time.
  • It introduces Dual-Stream Calibration (DSC), a test-time training framework with two coordinated calibration streams: a semantic stream that stabilizes generation by minimizing entropy over key evidence and a structural stream that learns latent inferential dependencies via iterative meta-learning.
  • DSC trains on specialized support sets during inference to better align external clinical evidence with the model’s internal logic, moving beyond passive attention-based matching toward active refinement of the latent reasoning space.
  • Experiments on thirteen clinical datasets show DSC outperforming multiple baselines across three task paradigms, including both training-dependent models and other test-time learning methods.
  • Overall, the work presents a reasoning-focused calibration method aimed at improving robustness and coherence of LLM-based clinical reasoning under heterogeneous real-world records.

Abstract

Contextual clinical reasoning demands robust inference grounded in complex, heterogeneous clinical records. While state-of-the-art fine-tuning, in-context learning (ICL), and retrieval-augmented generation (RAG) enable knowledge exposure, they often fall short of genuine contextual internalization: dynamically adjusting a model's internal representations to the subtle nuances of individual cases at inference time. To address this, we propose Dual-Stream Calibration (DSC), a test-time training framework that transcends superficial knowledge exposure to achieve deep internalization during inference. DSC facilitates input internalization by synergistically aligning two calibration streams. Unlike passive context exposure, the Semantic Calibration Stream enforces a deliberative reflection on core evidence, internalizing semantic anchors by minimizing entropy to stabilize generative trajectories. Simultaneously, the Structural Calibration Stream assimilates latent inferential dependencies through an iterative meta-learning objective. By training on specialized support sets at test-time, this stream enables the model to bridge the gap between external evidence and internal logic, synthesizing fragmented data into a coherent response. Our approach shifts the reasoning paradigm from passive attention-based matching to an active refinement of the latent inferential space. Validated against thirteen clinical datasets, DSC demonstrates superiority across three distinct task paradigms, consistently outstripping state-of-the-art baselines ranging from training-dependent models to test-time learning frameworks.