Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control
arXiv cs.LG / 4/22/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper proposes improving activation steering for LLMs by treating inference as a control problem with online (feedback) error correction rather than relying on non-anticipative open-loop interventions.
- It finds that, although transformer blocks are nonlinear, their layer-wise dynamics across multiple architectures and model scales are well-approximated by locally linear models, enabling a linear time-varying formulation.
- Using layer-wise Jacobians, the authors adapt the Linear Quadratic Regulator (LQR) to compute feedback controllers that steer activations toward target semantic setpoints with low computational overhead and no offline training.
- The method includes theoretical tracking-error bounds and introduces an adaptive semantic feature setpoint signal, leading to robust, fine-grained control across tasks.
- Experiments report state-of-the-art modulation of behaviors such as toxicity, truthfulness, refusals, and steering toward arbitrary concepts, and the authors release accompanying code on GitHub.
Related Articles

Black Hat USA
AI Business

Autoencoders and Representation Learning in Vision
Dev.to
Every AI finance app wants your data. I didn’t trust that — so I built my own. Offline.
Dev.to

Control Claude with Just a URL. The Chrome Extension "Send to Claude" Is Incredibly Useful
Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks
Dev.to