AutoPCR: Automated Phenotype Concept Recognition by Prompting

arXiv cs.CL / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • AutoPCR is a prompt-based phenotype concept recognition (CR) approach aimed at biomedical text mining tasks where phenotype mentions must be mapped to concepts.
  • The method is designed to generalize across new ontologies and previously unseen data without requiring ontology-specific training, addressing a key weakness of many prior CR systems.
  • AutoPCR also optionally uses a self-supervised training strategy to further improve performance.
  • Experimental results indicate that AutoPCR achieves the best average and most robust performance across multiple datasets, supported by ablation and transfer studies showing inductive capability and cross-ontology generalizability.
  • The paper provides an implementation and releases code via GitHub for reproducibility and downstream use.

Abstract

Motivation: Phenotype concept recognition (CR) is a fundamental task in biomedical text mining. However, existing methods either require ontology-specific training, making them struggle to generalize across diverse text styles and evolving biomedical terminology, or depend on general-purpose large language models (LLMs) that lack necessary domain knowledge. Results: To address these limitations, we propose AutoPCR, a prompt-based phenotype CR method designed to automatically generalize to new ontologies and unseen data without ontology-specific training. To further boost performance, we also introduce an optional self-supervised training strategy. Experiments show that AutoPCR achieves the best average and most robust performance across datasets. Further ablation and transfer studies demonstrate its inductive capability and generalizability to new ontologies. Availability and Implementation: Our code is available at https://github.com/yctao7/AutoPCR. Contact: drjieliu@umich.edu