Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

arXiv cs.CV / 4/1/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses why LLMs degrade in long-running conversations, attributing it to context length growth, memory saturation, and rising computational overhead.
It proposes an adaptive context compression framework combining importance-aware memory selection, coherence-sensitive filtering, and dynamic token-budget allocation to keep key information while limiting context expansion.
The approach is evaluated on the LOCOMO, LOCCO, and LongBench benchmarks, measuring answer quality, retrieval accuracy, coherence preservation, and computational efficiency.
Results show consistent gains in conversational stability and retrieval performance while reducing token usage and inference latency versus prior memory/compression methods.
The authors conclude that adaptive context compression can better balance long-term memory retention with efficiency for persistent LLM interactions.

Abstract

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression framework that integrates importance-aware memory selection, coherence-sensitive filtering, and dynamic budget allocation to retain essential conversational information while controlling context growth. The approach is evaluated on LOCOMO, LOCCO, and LongBench benchmarks to assess answer quality, retrieval accuracy, coherence preservation, and efficiency. Experimental results demonstrate that the proposed method achieves consistent improvements in conversational stability and retrieval performance while reducing token usage and inference latency compared with existing memory and compression-based approaches. These findings indicate that adaptive context compression provides an effective balance between long-term memory preservation and computational efficiency in persistent LLM interactions