FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models

arXiv cs.AI / 4/20/2026

💬 OpinionTools & Practical UsageModels & Research

共有:

Key Points

FineSteer is a new inference-time steering framework for large language models that aims to reduce issues like safety violations and hallucinations without updating model parameters.
The framework splits steering into two stages—Subspace-guided Conditional Steering (SCS) to avoid unnecessary changes that would harm utility, and Mixture-of-Steering-Experts (MoSE) to produce query-specific steering vectors.
SCS preserves general model utility by steering only when needed, rather than applying a rigid one-size-fits-all adjustment.
MoSE improves effectiveness by modeling the multimodal nature of desirable behaviors and synthesizing fine-grained steering vectors tailored to each input.
Experiments on safety and truthfulness benchmarks indicate FineSteer outperforms existing state-of-the-art approaches while maintaining minimal utility loss, and the authors provide released code.

Abstract

Large language models (LLMs) often exhibit undesirable behaviors, such as safety violations and hallucinations. Although inference-time steering offers a cost-effective way to adjust model behavior without updating its parameters, existing methods often fail to be simultaneously effective, utility-preserving, and training-efficient due to their rigid, one-size-fits-all designs and limited adaptability. In this work, we present FineSteer, a novel steering framework that decomposes inference-time steering into two complementary stages: conditional steering and fine-grained vector synthesis, allowing fine-grained control over when and how to steer internal representations. In the first stage, we introduce a Subspace-guided Conditional Steering (SCS) mechanism that preserves model utility by avoiding unnecessary steering. In the second stage, we propose a Mixture-of-Steering-Experts (MoSE) mechanism that captures the multimodal nature of desired steering behaviors and generates query-specific steering vectors for improved effectiveness. Through tailored designs in both SCS and MoSE, FineSteer maintains robust performance on general queries while adaptively optimizing steering vectors for targeted inputs in a training-efficient manner. Extensive experiments on safety and truthfulness benchmarks show that FineSteer outperforms state-of-the-art methods in overall performance, achieving stronger steering performance with minimal utility loss. Code is available at https://github.com/YukinoAsuna/FineSteer

Black Hat USA

AI Business

Black Hat Asia

AI Business

Which Version of Qwen 3.6 for M5 Pro 24g

Reddit r/LocalLLaMA

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

Which Version of Qwen 3.6 for M5 Pro 24g

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer