Where to Steer: Input-Dependent Layer Selection for Steering Improves LLM Alignment
arXiv cs.LG / 4/7/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that steering vectors for LLM alignment should not assume a single fixed intervention layer, because the layers encoding representations relevant to a target behavior can vary by input.
- It provides theoretical and empirical evidence that the optimal steering layer differs substantially across inputs and can affect alignment effectiveness.
- The authors introduce “Where to Steer (W2S),” a framework that learns an input-conditioned mapping from input embeddings to the best steering layer.
- Experiments across multiple LLMs and different alignment behaviors show W2S improves over fixed-layer steering baselines in both in-distribution and out-of-distribution settings.
- The work reframes adaptive, input-dependent layer selection as a missing design dimension in current steering-vector alignment methods.
Related Articles

Black Hat Asia
AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter
TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled
Reddit r/artificial
Does the AI 2027 paper still hold any legitimacy?
Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)
Dev.to