From Similarity to Structure: Training-free LLM Context Compression with Hybrid Graph Priors

arXiv cs.CL / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a training-free, model-agnostic method to compress long LLM contexts by selecting a small set of sentences under a strict token budget.
  • It builds a sparse hybrid sentence graph that mixes semantic mutual k-NN links with short-range sequential edges, then derives a topic “skeleton” via clustering.
  • Sentences are ranked with an interpretable scoring function that balances task relevance, cluster representativeness, bridge centrality, and a cycle-coverage cue to preserve coherence.
  • A budgeted greedy selection with redundancy suppression picks sentences while keeping them in original order, aiming to maintain readability and coverage.
  • Experiments on four datasets indicate the approach is competitive with strong extractive and abstractive baselines and shows bigger improvements on long-document benchmarks.

Abstract

Long-context large language models remain computationally expensive to run and often fail to reliably process very long inputs, which makes context compression an important component of many systems. Existing compression approaches typically rely on trained compressors, dense retrieval-style selection, or heuristic trimming, and they often struggle to jointly preserve task relevance, topic coverage, and cross-sentence coherence under a strict token budget. To address this, we propose a training-free and model-agnostic compression framework that selects a compact set of sentences guided by structural graph priors. Our method constructs a sparse hybrid sentence graph that combines mutual k-NN semantic edges with short-range sequential edges, extracts a topic skeleton via clustering, and ranks sentences using an interpretable score that integrates task relevance, cluster representativeness, bridge centrality, and a cycle coverage cue. A budgeted greedy selection with redundancy suppression then produces a readable compressed context in original order. Experimental results on four datasets show that our approach is competitive with strong extractive and abstractive baselines, demonstrating larger gains on long-document benchmarks.