文脈集約型タスクにおけるKVキャッシュのオフロード

arXiv cs.CL / 2026/4/10

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

要点

この論文は、特に文脈集約型タスクにおいて、長文脈LLM向けのKVキャッシュオフロードを研究する。これらのタスクでは、正確な解を得るために入力プロンプトからの大規模な情報検索が必要となる。
高い文脈要求のもとで、生テキストから構造化された知識抽出を測定するためのText2JSONベンチマークを導入し、公開する。
Llama 3およびQwen 3での実験により、既存のKVオフロード手法は、これらの文脈集約型ベンチマークにおいて大きな精度劣化を引き起こすことが示される。
著者らは失敗の要因として、キーの低ランク射影や信頼性の低い「ランドマーク」などを挙げ、複数のLLMファミリおよびベンチマークにわたって精度を改善する、より単純な代替戦略を提案する。
本研究は、長文脈の圧縮／オフロード技術には、先行ベンチマークでは十分に文脈集約的でなかったことを踏まえ、より厳密にタスクに関連した評価が必要であると結論づける。

Abstract

With the growing demand for long-context LLMs across a wide range of applications, the key-value (KV) cache has become a critical bottleneck for both latency and memory usage. Recently, KV-cache offloading has emerged as a promising approach to reduce memory footprint and inference latency while preserving accuracy. Prior evaluations have largely focused on tasks that do not require extracting large amounts of information from the context. In this work, we study KV-cache offloading on context-intensive tasks: problems where the solution requires looking up a lot of information from the input prompt. We create and release the Text2JSON benchmark, a highly context-intensive task that requires extracting structured knowledge from raw text. We evaluate modern KV offloading on Text2JSON and other context-intensive tasks and find significant performance degradation on both Llama 3 and Qwen 3 models. Our analysis identifies two key reasons for poor accuracy: low-rank projection of keys and unreliable landmarks, and proposes a simpler alternative strategy that significantly improves accuracy across multiple LLM families and benchmarks. These findings highlight the need for a comprehensive and rigorous evaluation of long-context compression techniques.

Black Hat Asia

AI Business

安川電機、人型ロボをオフィスへフィジカルAIで「臨機応変」実現

日経XTECH

フィジカルAIは日本の好機、米中と違う勝ち筋3つ FAに起こる地殻変動

日経XTECH

人型ロボット、中国が圧倒的に先行日本はコア部品技術で挽回へ

日経XTECH

デンソーのE2E自動運転戦略、VLA内製へ CTO「レベル4相当目指す」

日経XTECH

文脈集約型タスクにおけるKVキャッシュのオフロード

要点

Abstract

関連記事

Black Hat Asia

安川電機、人型ロボをオフィスへフィジカルAIで「臨機応変」実現

フィジカルAIは日本の好機、米中と違う勝ち筋3つ FAに起こる地殻変動

人型ロボット、中国が圧倒的に先行日本はコア部品技術で挽回へ

デンソーのE2E自動運転戦略、VLA内製へ CTO「レベル4相当目指す」

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

要点

Abstract

関連記事

Black Hat Asia

安川電機、人型ロボをオフィスへ フィジカルAIで「臨機応変」実現

フィジカルAIは日本の好機、米中と違う勝ち筋3つ FAに起こる地殻変動

人型ロボット、中国が圧倒的に先行 日本はコア部品技術で挽回へ

デンソーのE2E自動運転戦略、VLA内製へ CTO「レベル4相当目指す」

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

安川電機、人型ロボをオフィスへフィジカルAIで「臨機応変」実現

人型ロボット、中国が圧倒的に先行日本はコア部品技術で挽回へ