Extracting Breast Cancer Phenotypes from Clinical Notes: Comparing LLMs with Classical Ontology Methods

arXiv cs.CL / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The study presents an LLM-based framework for extracting structured breast cancer phenotypes (e.g., treatment outcomes, biomarkers, tumor location, size, and growth patterns) from unstructured oncology clinical notes in EMRs.
It evaluates the LLM approach against earlier ontology/knowledge-driven methods that use the NCIt Ontology Annotator for annotation.
Results indicate the LLM information-extraction framework can achieve accuracy comparable to classical ontology-based methods while leveraging natural-language notes.
The authors argue the trained framework is adaptable and can be fine-tuned to cover other cancer types and diseases beyond breast cancer.

Abstract

A significant amount of data held in Oncology Electronic Medical Records (EMRs) is contained in unstructured provider notes -- including but not limited to the chemotherapy (or cancer treatment) outcome, different biomarkers, the tumor's location, sizes, and growth patterns of a patient. The clinical studies show that the majority of oncologists are comfortable providing these valuable insights in their notes in a natural language rather than the relevant structured fields of an EMR. The major contribution of this research is to report an LLM-based framework to process provider notes and extract valuable medical knowledge and phenotype mentioned above, with a focus on the domain of oncology. In this paper, we focus on extracting phenotypes related to breast cancer using our LLM framework, and then compare its performance with earlier works that used knowledge-driven annotation system, paired with the NCIt Ontology Annotator. The results of the study show that an LLM-based information extraction framework can be easily adapted to extract phenotypes with an accuracy that is comparable to the classical ontology-based methods. However, once trained, they could be easily fine-tuned to cater for other cancer types and diseases.

Black Hat Asia

AI Business

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

TechCrunch

Why Anthropic’s new model has cybersecurity experts rattled

Reddit r/artificial

Does the AI 2027 paper still hold any legitimacy?

Reddit r/artificial

Why Most Productivity Systems Fail (And What to Do Instead)

Dev.to

Extracting Breast Cancer Phenotypes from Clinical Notes: Comparing LLMs with Classical Ontology Methods

Key Points

Abstract

Related Articles

Black Hat Asia

Amazon CEO takes aim at Nvidia, Intel, Starlink, more in annual shareholder letter

Why Anthropic’s new model has cybersecurity experts rattled

Does the AI 2027 paper still hold any legitimacy?

Why Most Productivity Systems Fail (And What to Do Instead)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer