FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models

arXiv cs.AI / 4/20/2026

💬 OpinionTools & Practical UsageModels & Research

Key Points

  • FineSteer is a new inference-time steering framework for large language models that aims to reduce issues like safety violations and hallucinations without updating model parameters.
  • The framework splits steering into two stages—Subspace-guided Conditional Steering (SCS) to avoid unnecessary changes that would harm utility, and Mixture-of-Steering-Experts (MoSE) to produce query-specific steering vectors.
  • SCS preserves general model utility by steering only when needed, rather than applying a rigid one-size-fits-all adjustment.
  • MoSE improves effectiveness by modeling the multimodal nature of desirable behaviors and synthesizing fine-grained steering vectors tailored to each input.
  • Experiments on safety and truthfulness benchmarks indicate FineSteer outperforms existing state-of-the-art approaches while maintaining minimal utility loss, and the authors provide released code.

Abstract

Large language models (LLMs) often exhibit undesirable behaviors, such as safety violations and hallucinations. Although inference-time steering offers a cost-effective way to adjust model behavior without updating its parameters, existing methods often fail to be simultaneously effective, utility-preserving, and training-efficient due to their rigid, one-size-fits-all designs and limited adaptability. In this work, we present FineSteer, a novel steering framework that decomposes inference-time steering into two complementary stages: conditional steering and fine-grained vector synthesis, allowing fine-grained control over when and how to steer internal representations. In the first stage, we introduce a Subspace-guided Conditional Steering (SCS) mechanism that preserves model utility by avoiding unnecessary steering. In the second stage, we propose a Mixture-of-Steering-Experts (MoSE) mechanism that captures the multimodal nature of desired steering behaviors and generates query-specific steering vectors for improved effectiveness. Through tailored designs in both SCS and MoSE, FineSteer maintains robust performance on general queries while adaptively optimizing steering vectors for targeted inputs in a training-efficient manner. Extensive experiments on safety and truthfulness benchmarks show that FineSteer outperforms state-of-the-art methods in overall performance, achieving stronger steering performance with minimal utility loss. Code is available at https://github.com/YukinoAsuna/FineSteer