Qwen Team Releases FlashQLA: a High-Performance Linear Attention Kernel Library That Achieves Up to 3× Speedup on NVIDIA Hopper GPUs

MarkTechPost / 4/30/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The QwenLM team released FlashQLA, a new high-performance kernel library focused on accelerating Gated Delta Network (GDN) Chunked Prefill for both forward and backward passes.
  • FlashQLA is designed for large-scale pretraining workloads and edge-side agentic inference scenarios.
  • The library reportedly delivers up to a 3× speedup on NVIDIA Hopper GPUs, indicating strong optimization for modern NVIDIA hardware.
  • By improving core attention-related compute, FlashQLA can reduce training and inference latency/cost for systems using the targeted GDN Chunked Prefill approach.
  • The release expands the available developer tooling around linear attention kernels, potentially making it easier to deploy faster attention implementations in production pipelines.

The QwenLM team has released FlashQLA, a new kernel library that dramatically accelerates the forward and backward passes of Gated Delta Network (GDN) Chunked Prefill, targeting both large-scale pretraining and edge-side agentic inference scenarios.

The post Qwen Team Releases FlashQLA: a High-Performance Linear Attention Kernel Library That Achieves Up to 3× Speedup on NVIDIA Hopper GPUs appeared first on MarkTechPost.