LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages

arXiv cs.AI / 3/23/2026

📰 NewsModels & Research

共有:

Key Points

LSR introduces the first cross-lingual safety benchmark for West African languages (Yoruba, Hausa, Igbo, Igala) to measure how model refusal behavior degrades when harmful intent is stated in a local language.
It uses a dual-probe evaluation protocol that submits matched English and target-language prompts to the same model to quantify cross-language refusal degradation.
It proposes Refusal Centroid Drift (RCD), a metric that quantifies how much of a model's English refusal behavior is lost in a target language.
The authors evaluate Gemini 2.5 Flash across 14 culturally grounded attack probes in four harm categories, finding English refusals around 90% but West African languages drop to 35-55%, with Igala most affected (RCD = 0.55).
The benchmark is implemented in Inspect AI and released as a PR-ready contribution to the UK AISI inspect_evals repository, with a live reference implementation and dataset publicly available.

Abstract

Safety alignment in large language models relies predominantly on English-language training data. When harmful intent is expressed in low-resource languages, refusal mechanisms that hold in English frequently fail to activate. We introduce LSR (Linguistic Safety Robustness), the first systematic benchmark for measuring cross-lingual refusal degradation in West African languages: Yoruba, Hausa, Igbo, and Igala. LSR uses a dual-probe evaluation protocol - submitting matched English and target-language probes to the same model - and introduces Refusal Centroid Drift (RCD), a metric that quantifies how much of a model's English refusal behavior is lost when harmful intent is encoded in a target language. We evaluate Gemini 2.5 Flash across 14 culturally grounded attack probes in four harm categories. English refusal rates hold at approximately 90 percent. Across West African languages, refusal rates fall to 35-55 percent, with Igala showing the most severe degradation (RCD = 0.55). LSR is implemented in the Inspect AI evaluation framework and is available as a PR-ready contribution to the UK AISI's inspect_evals repository. A live reference implementation and the benchmark dataset are publicly available.

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

Dev.to

My AI Does Not Have a Clock

Dev.to

How to settle on a coding LLM ? What parameters to watch out for ?

Reddit r/LocalLLaMA

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading

Reddit r/artificial

So cursor admits that Kimi K2.5 is the best open source model

Reddit r/LocalLLaMA

LSR: Linguistic Safety Robustness Benchmark for Low-Resource West African Languages

Key Points

Abstract

Related Articles

I made a 'benchmark' where LLMs write code controlling units in a 1v1 RTS game.

My AI Does Not Have a Clock

How to settle on a coding LLM ? What parameters to watch out for ?

Andrej Karpathy's autonomous AI research agent ran 700 experiments in 2 days and gave a glimpse of where AI is heading

So cursor admits that Kimi K2.5 is the best open source model

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer