StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference

arXiv cs.CL / 4/9/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

StructKV is proposed as a structure-aware KV-cache compression method for million-token-plus long-context LLM inference, aiming to reduce memory/bandwidth bottlenecks without harming long-range behavior.
The approach identifies “global information hubs” by computing Global In-Degree Centrality across attention patterns over network depth, rather than relying on single-layer local saliency.
It uses Dynamic Pivot Detection with information-theoretic metrics to adaptively choose the best layer for compression, addressing cases where tokens can be globally important but locally dormant.
StructKV further separates compute and memory constraints via Structural Propagation and Decoupling, enabling scalable long-context inference.
Experiments on LongBench and RULER indicate improved preservation of long-range dependencies and stronger retrieval robustness compared with prior token-pruning/compression methods.

Abstract

As Large Language Models (LLMs) scale to support context windows exceeding one million tokens, the linear growth of Key-Value (KV) cache imposes severe memory capacity and bandwidth bottlenecks, constraining the efficiency of long-context inference. Existing compression approaches typically prioritize tokens based on local saliency metrics to decouple prefill computation from decoding memory. However, these methods often rely on local saliency snapshots at a specific layer, thereby systematically discarding tokens that act as global information hubs across the network depth but appear temporarily dormant at the specific layer selected for pruning. To address this limitation, we propose StructKV, a structure-aware KV cache compression framework that introduces three core innovations: First, Global In-Degree Centrality aggregates attention patterns across the network depth to identify global information hubs. Second, Dynamic Pivot Detection utilizes information-theoretic metrics to adaptively locate the optimal layer for compression. Finally, Structural Propagation and Decoupling separates the computational budget from the memory storage budget. Experimental results on the LongBench and RULER benchmarks demonstrate that StructKV effectively preserves long-range dependencies and retrieval robustness.

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Moving from proof of concept to production: what we learned with Nometria

Dev.to

Frontend Engineers Are Becoming AI Trainers

Dev.to

StructKV: Preserving the Structural Skeleton for Scalable Long-Context Inference

Key Points

Abstract

Related Articles

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

Moving from proof of concept to production: what we learned with Nometria

Frontend Engineers Are Becoming AI Trainers

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer