Beyond Word Boundaries: A Hebrew Coreference Benchmark and an Evaluation Protocol for Morphologically Complex Text
arXiv cs.CL / 4/21/2026
📰 NewsModels & Research
Key Points
- The paper proposes KibutzR, the first comprehensive coreference resolution dataset for Modern Hebrew, tailored to morphologically rich languages where mention boundaries often diverge from word boundaries.
- It introduces a segmentation-aware evaluation protocol that scores coreference across word, sub-word, and multi-word mention levels to handle morpheme/token boundary discrepancies.
- Experiments show that current LLMs perform substantially worse on Hebrew than on English, and that performance drops further when using raw, unsegmented text.
- The study finds an unexpected inverse trend versus English: in Hebrew, smaller encoder-style models outperform decoder-style models, suggesting different modeling and evaluation needs for MRLs.
- The benchmark and protocol are intended to guide future work on coreference and evaluation for other morphologically complex languages beyond Hebrew.
Related Articles

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA
Where is Grok-2 Mini and Grok-3 (mini)?
Reddit r/LocalLLaMA