Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs

arXiv cs.CL / 3/30/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study benchmarks multiple BERT-family models and LLMs for Portuguese clinical named entity recognition (NER) using the SemClinBr public corpus and a private breast cancer dataset.
  • It compares models including BioBERTpt, BERTimbau, ModernBERT, and mmBERT against LLMs such as GPT-5 and Gemini-2.5 under identical training and evaluation conditions.
  • mmBERT-base achieved the best reported performance with micro F1 = 0.76, outperforming the other tested models and indicating strong suitability for Portuguese clinical NER.
  • The paper tests data-imbalance mitigation strategies (iterative stratification, weighted loss, oversampling), finding that iterative stratification improves class balance and overall results.
  • It concludes that multilingual BERT models—especially mmBERT—are effective for Portuguese clinical NER and can be run locally with limited computational resources when paired with balanced splitting strategies.

Abstract

Clinical notes contain valuable unstructured information. Named entity recognition (NER) enables the automatic extraction of medical concepts; however, benchmarks for Portuguese remain scarce. In this study, we aimed to evaluate BERT-based models and large language models (LLMs) for clinical NER in Portuguese and to test strategies for addressing multilabel imbalance. We compared BioBERTpt, BERTimbau, ModernBERT, and mmBERT with LLMs such as GPT-5 and Gemini-2.5, using the public SemClinBr corpus and a private breast cancer dataset. Models were trained under identical conditions and evaluated using precision, recall, and F1-score. Iterative stratification, weighted loss, and oversampling were explored to mitigate class imbalance. The mmBERT-base model achieved the best performance (micro F1 = 0.76), outperforming all other models. Iterative stratification improved class balance and overall performance. Multilingual BERT models, particularly mmBERT, perform strongly for Portuguese clinical NER and can run locally with limited computational resources. Balanced data-splitting strategies further enhance performance.