Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration

arXiv cs.LG / 3/20/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

AFBS-BO combines Bayesian Optimization with binary search and multi-fidelity evaluation to automatically discover optimal layer- and head-specific sparse-attention hyperparameters without human tuning.
It addresses production usability by turning sparse attention into a self-optimizing primitive, enabling plug-and-play acceleration across transformer architectures.
On Llama-2-7B, AFBS-BO achieves 3.4x faster hyperparameter discovery with 8.8x fewer evaluations than grid search and identifies high-sparsity configurations that closely match dense-quality results while outperforming existing sparse baselines.
This approach broadens the practicality of sparse attention, potentially accelerating deployment in diverse domains and workloads.

Abstract

Sparse attention mechanisms promise to break the quadratic bottleneck of long-context transformers, yet production adoption remains limited by a critical usability gap: optimal hyperparameters vary substantially across layers and models, and current methods (e.g., SpargeAttn) rely on manual grid search to identify them. We propose AFBS-BO (Adaptive Fidelity Binary Search with Bayesian Optimization), a fully automated framework that discovers optimal layer- and head-specific hyperparameters without human intervention. Our hybrid algorithm combines Bayesian Optimization for global exploration with binary search for local refinement, leveraging multi-fidelity evaluation across sequence lengths to reduce tuning cost. On Llama-2-7B, AFBS-BO accelerates hyperparameter discovery by 3.4x with 8.8x fewer evaluations than grid search, and identifies high-sparsity configurations that outperform existing sparse attention baselines while closely matching dense attention quality. By transforming sparse attention from a manually tuned heuristic into a self-optimizing primitive, AFBS-BO enables plug-and-play acceleration across diverse transformer architectures and domains.

We Scanned 11,529 MCP Servers for EU AI Act Compliance

Dev.to

The Complete Guide to AI Prompts for Content Creators

Dev.to

Automating the Chase: AI for Festival Vendor Compliance

Dev.to

From Piles to Protocol: AI for Vendor Compliance at Scale

Dev.to

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

Dev.to

Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration

Key Points

Abstract

Related Articles

We Scanned 11,529 MCP Servers for EU AI Act Compliance

The Complete Guide to AI Prompts for Content Creators

Automating the Chase: AI for Festival Vendor Compliance

From Piles to Protocol: AI for Vendor Compliance at Scale

MCP Skills vs MCP Tools: The Right Way to Configure Your Server

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer