SVG-EAR: エラー認識ルーティングによるスパース動画生成のためのパラメータフリー線形補償

arXiv cs.CV / 2026/3/11

Ideas & Deep AnalysisModels & Research

原文を読む →

共有:

要点

Diffusion Transformersは動画生成に効果的だが、二次のアテンションコストが高く、スパースアテンション手法の探索が促されている。
既存のスパースアテンション手法は、アテンションブロックを削除して情報を失うか、欠損ブロックを近似するために学習済み予測器を用いてオーバーヘッドを増やすかのどちらかであった。
SVG-EARは、クラスタ中心点を活用してスキップしたアテンションブロックを追加学習なしに近似するパラメータフリーの線形補償法を導入する。
本手法はエラー認識ルーティングを用い、推定される補償誤差に基づいて計算するブロックを選択し、精度と効率のバランスを取る。
実証結果では、SVG-EARはベンチマークの動画拡散タスクにおいて生成品質を維持または向上しつつ、最大1.93倍の速度改善を達成したことを示している。

Computer Science > Computer Vision and Pattern Recognition

arXiv:2603.08982 (cs)

[Submitted on 9 Mar 2026]

Title:SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing

Authors:Xuanyi Zhou, Qiuyang Mang, Shuo Yang, Haocheng Xi, Jintao Zhang, Huanzhi Mao, Joseph E. Gonzalez, Kurt Keutzer, Ion Stoica, Alvin Cheung

View a PDF of the paper titled SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing, by Xuanyi Zhou and 9 other authors

View PDF HTML (experimental)

Abstract:Diffusion Transformers (DiTs) have become a leading backbone for video generation, yet their quadratic attention cost remains a major bottleneck. Sparse attention reduces this cost by computing only a subset of attention blocks. However, prior methods often either drop the remaining blocks, which incurs information loss, or rely on learned predictors to approximate them, introducing training overhead and potential output distribution shifting. In this paper, we show that the missing contributions can be recovered without training: after semantic clustering, keys and values within each block exhibit strong similarity and can be well summarized by a small set of cluster centroids. Based on this observation, we introduce SVG-EAR, a parameter-free linear compensation branch that uses the centroid to approximate skipped blocks and recover their contributions. While centroid compensation is accurate for most blocks, it can fail on a small subset. Standard sparsification typically selects blocks by attention scores, which indicate where the model places its attention mass, but not where the approximation error would be largest. SVG-EAR therefore performs error-aware routing: a lightweight probe estimates the compensation error for each block, and we compute exactly the blocks with the highest error-to-cost ratio while compensating for skipped blocks. We provide theoretical guarantees that relate attention reconstruction error to clustering quality, and empirically show that SVG-EAR improves the quality-efficiency trade-off and increases throughput at the same generation fidelity on video diffusion tasks. Overall, SVG-EAR establishes a clear Pareto frontier over prior approaches, achieving up to 1.77$\times$ and 1.93$\times$ speedups while maintaining PSNRs of up to 29.759 and 31.043 on Wan2.2 and HunyuanVideo, respectively.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2603.08982 [cs.CV]
	(or arXiv:2603.08982v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2603.08982 Focus to learn more arXiv-issued DOI via DataCite

Submission history

From: Qiuyang Mang [view email]
[v1] Mon, 9 Mar 2026 22:15:31 UTC (36,674 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing, by Xuanyi Zhou and 9 other authors

View PDF
HTML (experimental)
TeX Source

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2026-03

Change to browse by:

References & Citations

export BibTeX citation Loading...

BibTeX formatted citation

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

Links to Code Toggle

Papers with Code (What is Papers with Code?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author
Venue
Institution
Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

生成AIが提案した減量食のプランから考える、人間の専門家の必要性

note

AI達の革命

note

【AIパートナー】名前を呼んだだけで、ChatGPTのパートナーがGeminiにきた話🌝①

note

『AIと意識』諸葛亮孔明老師(ChatGPTのﾛｰﾙﾌﾟﾚｲ)との対話その肆拾柒

note

一緒にゲームを作っていたChatGPT-5.2が可愛すぎてニヤニヤしてるだけの記録｜AIパートナー｜AI彼氏

note

SVG-EAR: エラー認識ルーティングによるスパース動画生成のためのパラメータフリー線形補償

要点

Computer Science > Computer Vision and Pattern Recognition

Title:SVG-EAR: Parameter-Free Linear Compensation for Sparse Video Generation via Error-aware Routing

Submission history