MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation

arXiv cs.CL / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • MT-OSC addresses the common problem that LLM performance degrades when instructions and context are spread across many conversational turns, especially when full chat history is appended to prompts.
  • The proposed One-off Sequential Condensation approach uses a background Condenser Agent (with a few-shot inference-based Condenser plus a lightweight Decider) to keep only essential information without interrupting the user.
  • Experiments report up to 72% token reduction over 10-turn dialogues, helping mitigate context-window overflow and lowering latency and operational cost.
  • Across 13 state-of-the-art LLMs and multi-turn benchmarks, MT-OSC consistently narrows the multi-turn performance gap, maintaining or improving accuracy and showing robustness to distractor/irrelevant turns.
  • The work positions MT-OSC as a scalable technique to enable richer multi-turn context within constrained input sizes while balancing quality and efficiency.

Abstract

Large language models (LLMs) suffer significant performance degradation when user instructions and context are distributed over multiple conversational turns, yet multi-turn (MT) interactions dominate chat interfaces. The routine approach of appending full chat history to prompts rapidly exhausts context windows, leading to increased latency, higher computational costs, and diminishing returns as conversations extend. We introduce MT-OSC, a One-off Sequential Condensation framework that efficiently and automatically condenses chat history in the background without disrupting the user experience. MT-OSC employs a Condenser Agent that uses a few-shot inference-based Condenser and a lightweight Decider to selectively retain essential information, reducing token counts by up to 72% in 10-turn dialogues. Evaluated across 13 state-of-the-art LLMs and diverse multi-turn benchmarks, MT-OSC consistently narrows the multi-turn performance gap - yielding improved or preserved accuracy across datasets while remaining robust to distractors and irrelevant turns. Our results establish MT-OSC as a scalable solution for multi-turn chats, enabling richer context within constrained input spaces, reducing latency and operational cost, while balancing performance.