ELM: A Hybrid Ensemble of Language Models for Automated Tumor Group Classification in Population-Based Cancer Registries

arXiv cs.CL / 3/20/2026

📰 NewsTools & Practical UsageIndustry & Market MovesModels & Research

共有:

Key Points

ELM is a hybrid ensemble that combines six encoder-only language models (three for the top portion and three for the bottom portion of each report) with a large language model that arbiters when five of six encoders agree to assign a tumor group.
On a held-out test set of 2,058 pathology reports across 19 tumor groups, ELM achieves a weighted precision and recall of 0.94, significantly outperforming encoder-only ensembles (0.91 F1) and rule-based approaches (p<0.001).
In production at the British Columbia Cancer Registry, ELM reduced manual review by about 60–70%, saving an estimated 900 person-hours annually while maintaining data quality.
The study claims this is the first successful deployment of a hybrid small encoder-only models-LLM architecture for tumor group classification in a real-world population-based cancer registry setting.
ELM delivers notable gains in challenging categories such as leukemia, lymphoma, and skin cancer, with substantial F1-score improvements.

Abstract

Background: Population-based cancer registries (PBCRs) manually extract data from unstructured pathology reports, a labor-intensive process where assigning reports to tumor groups can consume 900 person-hours annually for approximately 100,000 reports at a medium-sized registry. Current automated rule-based systems fail to handle the linguistic complexity of this classification task. Materials and Methods: We present ELM (Ensemble of Language Models), a novel hybrid approach combining small, encoder only language models and large language models (LLMs). ELM employs an ensemble of six fine-tuned encoder only models: three analyzing the top portion and three analyzing the bottom portion of each report to maximize text coverage given token limits. A tumor group is assigned when at least five of six models agree; otherwise, an LLM arbitrates using a carefully curated prompt constrained to likely tumor groups. Results: On a held-out test set of 2,058 pathology reports spanning 19 tumor groups, ELM achieves weighted precision and recall of 0.94, representing a statistically significant improvement (p<0.001) over encoder-only ensembles (0.91 F1-score) and substantially outperforming rule-based approaches. ELM demonstrates particular gains for challenging categories including leukemia (F1: 0.76 to 0.88), lymphoma (0.76 to 0.89), and skin cancer (0.44 to 0.58). Discussion: Deployed in production at British Columbia Cancer Registry, ELM has reduced manual review requirements by approximately 60-70%, saving an estimated 900 person-hours annually while maintaining data quality standards. Conclusion: ELM represents the first successful deployment of a hybrid small, encoder only models-LLM architecture for tumor group classification in a real-world PBCR setting, demonstrating how strategic combination of language models can achieve both high accuracy and operational efficiency.

Two bots, one confused server: what Nimbus revealed about AI agent identity

Dev.to

How to Create a Month of Content in One Day Using AI (Step-by-Step System)

Dev.to

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

Dev.to

🌱 How AI is Transforming Planting — and Why It Matters

Dev.to

What is MCP?

Dev.to

ELM: A Hybrid Ensemble of Language Models for Automated Tumor Group Classification in Population-Based Cancer Registries

Key Points

Abstract

Related Articles

Two bots, one confused server: what Nimbus revealed about AI agent identity

How to Create a Month of Content in One Day Using AI (Step-by-Step System)

OpenTelemetry just standardized LLM tracing. Here's what it actually looks like in code.

🌱 How AI is Transforming Planting — and Why It Matters

What is MCP?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer