LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
arXiv cs.LG / 4/15/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that LLM safety gaps between high- and low-resource languages come from language-dominant safety alignment that does not match the model’s language-agnostic semantic understanding.
- It identifies a “semantic bottleneck” layer where representation geometry is driven more by shared semantics than by language identity.
- Building on this, LASA (Language-Agnostic Semantic Alignment) anchors safety alignment at the semantic bottleneck rather than relying on surface-text cues.
- Experiments report a major reduction in average attack success rate, e.g., from 24.7% to 2.8% on LLaMA-3.1-8B-Instruct, and maintaining roughly 3–4% ASR across Qwen2.5/Qwen3 Instruct models (7B–32B).
- The work reframes LLM safety alignment as a representation-level problem, emphasizing alignment in the model’s language-agnostic semantic space for robust multilingual safety.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.




