Training-Free Test-Time Contrastive Learning for Large Language Models

arXiv cs.CL / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces TF-TTCL, a training-free test-time adaptation method that improves frozen LLMs under distribution shift without gradient-based white-box updates.
  • TF-TTCL uses an “Explore-Reflect-Steer” loop that generates diverse reasoning trajectories via multi-agent semantic query augmentation, compares trajectories, and distills the semantic differences into explicit textual rules.
  • During inference, it retrieves and applies the distilled contextual rules to steer the model toward more robust reasoning patterns while avoiding error modes observed during the test process.
  • Experiments on both closed-ended and open-ended reasoning benchmarks show TF-TTCL outperforms strong zero-shot baselines and several existing test-time adaptation approaches in online evaluation settings.
  • The authors provide an implementation at the linked GitHub repository, enabling replication and experimentation with the proposed framework.

Abstract

Large language models (LLMs) demonstrate strong reasoning capabilities, but their performance often degrades under distribution shift. Existing test-time adaptation (TTA) methods rely on gradient-based updates that require white-box access and need substantial overhead, while training-free alternatives are either static or depend on external guidance. In this paper, we propose Training-Free Test-Time Contrastive Learning TF-TTCL, a training-free adaptation framework that enables a frozen LLM to improve online by distilling supervision from its own inference experiences. Specifically, TF-TTCL implements a dynamic "Explore-Reflect-Steer" loop through three core modules: 1) Semantic Query Augmentation first diversifies problem views via multi-agent role-playing to generate different reasoning trajectories; 2) Contrastive Experience Distillation then captures the semantic gap between superior and inferior trajectories, distilling them into explicit textual rules; and 3) Contextual Rule Retrieval finally activates these stored rules during inference to dynamically steer the frozen LLM toward robust reasoning patterns while avoiding observed errors. Extensive experiments on closed-ended reasoning tasks and open-ended evaluation tasks demonstrate that TF-TTCL consistently outperforms strong zero-shot baselines and representative TTA methods under online evaluation. Code is available at https://github.com/KevinSCUTer/TF-TTCL.