LLM Probe: Evaluating LLMs for Low-Resource Languages
arXiv cs.CL / 4/1/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper introduces LLM Probe, a lexicon-based framework for evaluating LLM capabilities in low-resource, morphologically rich languages using standardized linguistic probes.
- It assesses models across four task areas: lexical alignment, part-of-speech recognition, morphosyntactic probing, and translation accuracy.
- The authors create and release a manually annotated bilingual benchmark dataset for a low-resource Semitic language, including POS, grammatical gender, and morphosyntactic feature annotations with high inter-annotator agreement.
- Experimental results across causal and sequence-to-sequence models show tradeoffs: sequence-to-sequence models tend to perform better on morphosyntax and translation, while causal models are stronger on lexical alignment but weaker on translation.
- The work argues that linguistically grounded evaluation is necessary to understand LLM limitations in under-resourced settings and includes open-source release of the framework and dataset for reproducible benchmarking.




