Steering Frozen LLMs: Adaptive Social Alignment via Online Prompt Routing
arXiv cs.AI / 3/18/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues for inference-time governance to adapt LLM safety without retraining, addressing the limitations of static post-training alignment like RLHF.
- It introduces the Consensus Clustering LinUCB Bandit (CCLUB), a unified framework for adaptive social alignment via system-prompt routing.
- CCLUB uses a conservative consensus clustering mechanism that pools data only within the intersection of utility and safety similarity graphs to prevent unsafe generalization across semantically proximal but risk-divergent contexts.
- Theoretical analysis yields a sublinear regret bound, and experiments show CCLUB achieving a 10.98% improvement in cumulative reward and a 14.42% reduction in the average suboptimality gap over strong baselines.




