AI Navigate

Self-Tuning Sparse Attention: Multi-Fidelity Hyperparameter Optimization for Transformer Acceleration

arXiv cs.LG / 3/20/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • AFBS-BO combines Bayesian Optimization with binary search and multi-fidelity evaluation to automatically discover optimal layer- and head-specific sparse-attention hyperparameters without human tuning.
  • It addresses production usability by turning sparse attention into a self-optimizing primitive, enabling plug-and-play acceleration across transformer architectures.
  • On Llama-2-7B, AFBS-BO achieves 3.4x faster hyperparameter discovery with 8.8x fewer evaluations than grid search and identifies high-sparsity configurations that closely match dense-quality results while outperforming existing sparse baselines.
  • This approach broadens the practicality of sparse attention, potentially accelerating deployment in diverse domains and workloads.

Abstract

Sparse attention mechanisms promise to break the quadratic bottleneck of long-context transformers, yet production adoption remains limited by a critical usability gap: optimal hyperparameters vary substantially across layers and models, and current methods (e.g., SpargeAttn) rely on manual grid search to identify them. We propose AFBS-BO (Adaptive Fidelity Binary Search with Bayesian Optimization), a fully automated framework that discovers optimal layer- and head-specific hyperparameters without human intervention. Our hybrid algorithm combines Bayesian Optimization for global exploration with binary search for local refinement, leveraging multi-fidelity evaluation across sequence lengths to reduce tuning cost. On Llama-2-7B, AFBS-BO accelerates hyperparameter discovery by 3.4x with 8.8x fewer evaluations than grid search, and identifies high-sparsity configurations that outperform existing sparse attention baselines while closely matching dense attention quality. By transforming sparse attention from a manually tuned heuristic into a self-optimizing primitive, AFBS-BO enables plug-and-play acceleration across diverse transformer architectures and domains.