Steering Frozen LLMs: Adaptive Social Alignment via Online Prompt Routing
arXiv cs.AI / 3/18/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues for inference-time governance to adapt LLM safety without retraining, addressing the limitations of static post-training alignment like RLHF.
- It introduces the Consensus Clustering LinUCB Bandit (CCLUB), a unified framework for adaptive social alignment via system-prompt routing.
- CCLUB uses a conservative consensus clustering mechanism that pools data only within the intersection of utility and safety similarity graphs to prevent unsafe generalization across semantically proximal but risk-divergent contexts.
- Theoretical analysis yields a sublinear regret bound, and experiments show CCLUB achieving a 10.98% improvement in cumulative reward and a 14.42% reduction in the average suboptimality gap over strong baselines.
Related Articles
How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models
Reddit r/LocalLLaMA
Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)
Dev.to
The Obligor
Dev.to
The Markup
Dev.to
2026 年 AI 部落格變現完整攻略:從第一篇文章到月收入 $1000
Dev.to