Pragma-VL: Towards a Pragmatic Arbitration of Safety and Helpfulness in MLLMs
arXiv cs.LG / 3/17/2026
📰 NewsModels & Research
Key Points
- Pragma-VL proposes an end-to-end alignment scheme for multimodal LLMs to pragmatically arbitrate between safety and usefulness, addressing the safety-utility trade-off.
- It adds a cold-start supervised fine-tuning stage that improves visual risk perception via risk-aware clustering of the visual encoder and an interleaved dataset of risk descriptions and high-quality data.
- The approach introduces a theoretically-guaranteed reward model trained with a novel data augmentation method that assigns dynamic weights based on user queries to enable contextual arbitration between safety and helpfulness.
- Experimental results show Pragma-VL outperforms baselines by 5% to 20% on most multimodal safety benchmarks while preserving core capabilities in mathematics and knowledge reasoning.
Related Articles
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

#2 : プロンプト研究講座【第17回】プロンプトの「温度感」と「湿度感」の表現
note

菊地康巳「AIとぼくの研究日記」
note

🧠 Reiが「自分の推論を監査する」存在になった日——STEP181〜186、二層監査体制完成と統合インターフェイスの誕生
note