Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
arXiv cs.LG / 4/24/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- The paper introduces “Preconditioned DeltaNet,” a curvature-aware variant of delta-rule recurrent models aimed at alleviating long-context compute limits seen in softmax attention.
- It frames recurrences using the test-time regression (TTR) view as online least-squares updates that learn a linear mapping from keys to values, highlighting that prior delta-rule recurrences neglected curvature during optimization.
- The authors derive theoretical equivalences between linear attention and the delta rule under an exactly preconditioned setting, then implement a practical diagonal preconditioning approximation.
- They build preconditioned versions of DeltaNet, GDN, and KDA and provide efficient chunkwise parallel computation methods to make them scalable.
- Experiments show consistent gains for preconditioned delta-rule recurrences on synthetic recall benchmarks and language modeling at 340M and 1B parameter scales.
Related Articles

GPT-5.5 is here. So is DeepSeek V4. And honestly, I am tired of version numbers.
Dev.to

I Built an AI Image Workflow with GPT Image 2.0 (+ Fixing Its Biggest Flaw)
Dev.to
Max-and-Omnis/Nemotron-3-Super-64B-A12B-Math-REAP-GGUF
Reddit r/LocalLLaMA

Building a Visual Infrastructure Layer: How We’re Solving the "Visual Trust Gap" for E-com
Dev.to
Qwen3.6 35B-A3B is quite useful on 780m iGPU (llama.cpp,vulkan)
Reddit r/LocalLLaMA