| Introducing FlashQLA: high-performance linear attention kernels built on TileLang. 2–3× forward speedup. 2× backward speedup. 💻 Purpose-built for agentic AI on your personal devices. Key insights:
FlashQLA boosts SM utilization via automatic intra-device CP. The gains are especially pronounced for TP setups, small models, and long-context workloads. Instead of fusing the entire GDN flow into a single kernel, we split it into two kernels optimized for CP and backward efficiency. At large batch sizes this incurs extra memory I/O overhead vs. a fully fused approach, but it delivers better real-world performance on edge devices and long-context workloads. The backward pass was the hardest part: we built a 16-stage warp-specialized pipeline under extremely tight on-chip memory constraints, ultimately achieving 2×+ kernel-level speedups. We hope this is useful to the community! Learn more: 📖 Blog: https://qwen.ai/blog?id=flashqla [link] [comments] |
Qwen Introduced FlashQLA
Reddit r/LocalLLaMA / 4/29/2026
📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- Qwen introduced FlashQLA, a set of high-performance linear attention kernels built on TileLang, aimed at improving efficiency for agentic AI on personal/edge devices.
- The approach delivers reported speedups of 2–3× for the forward pass and about 2× for the backward pass, with stronger gains for TP setups, small models, and long-context workloads.
- FlashQLA uses gate-driven automatic intra-card (intra-device) CP and a hardware-friendly algebraic reformulation to boost SM utilization.
- Instead of fully fusing the entire GDN flow into one kernel, it splits the work into two kernels optimized for CP and backward efficiency, trading some extra memory I/O at large batch sizes for better real-world edge performance.
- The backward pass was engineered as a 16-stage warp-specialized pipeline under tight on-chip memory constraints, reaching 2×+ kernel-level speedups; Qwen provides a blog and GitHub code release.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat USA
AI Business

Builder Platforms Fail at Production. Here's What Changed for Us with Nometria
Dev.to

A beginner's guide to the Gemini-2.5-Flash model by Google on Replicate
Dev.to

Hugging Face 'Spaces' now acts as an MCP-App-Store. Anybody thinking on the security consequence?
Dev.to

8 AI Prompts That Win Freelance Clients (Copy-Paste Ready for 2026)
Dev.to