IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Reddit r/LocalLLaMA / 3/14/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

IndexCache provides a patch for SGLang and vLLM to accelerate inference for models that use DeepSeek Sparse Attention (DSA), including DeepSeek-V3.2 and GLM-5.
The approach enables cross-layer index reuse, eliminating up to 75% of indexer computations and delivering up to 1.82× prefill speedup and 1.48× decode speedup with negligible quality loss.
The patch requires only a single if/else branch and uses zero additional GPU memory, and it supports the listed models/architectures.
The patch is contributed by user /u/pmttyji and is hosted on THUDM's IndexCache repository, signaling a practical tooling improvement for the community.

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

This repository provides a patch for SGLang and vLLM that enables IndexCache inference acceleration for models using DeepSeek Sparse Attention (DSA), including DeepSeek-V3.2 and GLM-5.

TL;DR: IndexCache eliminates up to 75% of indexer computations in DSA through cross-layer index reuse — achieving up to 1.82× prefill speedup and 1.48× decode speedup with negligible quality degradation. One if/else branch, zero extra GPU memory.

	Baseline	IndexCache (1/4)	Speedup
Prefill (200K)	19.5s	10.7s	1.82×
Decode (200K)	58 tok/s	86 tok/s	1.48×

✅ Supported Models

Model	Architecture	Supported
DeepSeek-V3.2	`DeepseekV32ForCausalLM`	✅
GLM-5 (744B)	`GlmMoeDsaForCausalLM`	✅

Any model using DSA indexer benefits from this patch.

Via https://xcancel.com/realYushiBai/status/2032299919999189107#m

#JustSharing

submitted by /u/pmttyji
[link] [comments]

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

Ledge.ai

AIと創作

note

働くライター｜AI×note

note

まな式AI活用術で、人生が動き出した人たち

note

【教えてAI】「カメラのいらないテレビ電話」「POPOPO」って何？

note

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Key Points

✅ Supported Models

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像

AIと創作

働くライター｜AI×note

まな式AI活用術で、人生が動き出した人たち

【教えてAI】「カメラのいらないテレビ電話」「POPOPO」って何？

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

✅ Supported Models

Related Articles

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開 初のライセンス提供 のサムネイル画像

AIと創作

働くライター｜AI×note

まな式AI活用術で、人生が動き出した人たち

【教えてAI】「カメラのいらないテレビ電話」「POPOPO」って何？

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

パナソニックHD、シンガポール開発拠点の視覚検査向けAIプラットフォームをグローバル展開初のライセンス提供のサムネイル画像