ELM: A Hybrid Ensemble of Language Models for Automated Tumor Group Classification in Population-Based Cancer Registries
arXiv cs.CL / 3/20/2026
📰 NewsTools & Practical UsageIndustry & Market MovesModels & Research
Key Points
- ELM is a hybrid ensemble that combines six encoder-only language models (three for the top portion and three for the bottom portion of each report) with a large language model that arbiters when five of six encoders agree to assign a tumor group.
- On a held-out test set of 2,058 pathology reports across 19 tumor groups, ELM achieves a weighted precision and recall of 0.94, significantly outperforming encoder-only ensembles (0.91 F1) and rule-based approaches (p<0.001).
- In production at the British Columbia Cancer Registry, ELM reduced manual review by about 60–70%, saving an estimated 900 person-hours annually while maintaining data quality.
- The study claims this is the first successful deployment of a hybrid small encoder-only models-LLM architecture for tumor group classification in a real-world population-based cancer registry setting.
- ELM delivers notable gains in challenging categories such as leukemia, lymphoma, and skin cancer, with substantial F1-score improvements.
Related Articles

Attacks On Data Centers, Qwen3.5 In All Sizes, DeepSeek’s Huawei Play, Apple’s Multimodal Tokenizer
The Batch

ベテランの若手育成負担を減らせ、PLC制御の「ラダー図」をAIで生成
日経XTECH

Your AI generated code is "almost right", and that is actually WORSE than it being "wrong".
Dev.to

Lessons from Academic Plagiarism Tools for SaaS Product Development
Dev.to

Windsurf’s New Pricing Explained: Simpler AI Coding or Hidden Trade-Offs?
Dev.to