ELM: A Hybrid Ensemble of Language Models for Automated Tumor Group Classification in Population-Based Cancer Registries
arXiv cs.CL / 3/20/2026
📰 NewsTools & Practical UsageIndustry & Market MovesModels & Research
Key Points
- ELM is a hybrid ensemble that combines six encoder-only language models (three for the top portion and three for the bottom portion of each report) with a large language model that arbiters when five of six encoders agree to assign a tumor group.
- On a held-out test set of 2,058 pathology reports across 19 tumor groups, ELM achieves a weighted precision and recall of 0.94, significantly outperforming encoder-only ensembles (0.91 F1) and rule-based approaches (p<0.001).
- In production at the British Columbia Cancer Registry, ELM reduced manual review by about 60–70%, saving an estimated 900 person-hours annually while maintaining data quality.
- The study claims this is the first successful deployment of a hybrid small encoder-only models-LLM architecture for tumor group classification in a real-world population-based cancer registry setting.
- ELM delivers notable gains in challenging categories such as leukemia, lymphoma, and skin cancer, with substantial F1-score improvements.
Related Articles
Two bots, one confused server: what Nimbus revealed about AI agent identity
Dev.to
How to Create a Month of Content in One Day Using AI (Step-by-Step System)
Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.
Dev.to
🌱 How AI is Transforming Planting — and Why It Matters
Dev.to

What is MCP?
Dev.to