Clinical named entity recognition in the Portuguese language: a benchmark of modern BERT models and LLMs
arXiv cs.CL / 3/30/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The study benchmarks multiple BERT-family models and LLMs for Portuguese clinical named entity recognition (NER) using the SemClinBr public corpus and a private breast cancer dataset.
- It compares models including BioBERTpt, BERTimbau, ModernBERT, and mmBERT against LLMs such as GPT-5 and Gemini-2.5 under identical training and evaluation conditions.
- mmBERT-base achieved the best reported performance with micro F1 = 0.76, outperforming the other tested models and indicating strong suitability for Portuguese clinical NER.
- The paper tests data-imbalance mitigation strategies (iterative stratification, weighted loss, oversampling), finding that iterative stratification improves class balance and overall results.
- It concludes that multilingual BERT models—especially mmBERT—are effective for Portuguese clinical NER and can be run locally with limited computational resources when paired with balanced splitting strategies.
Related Articles

What is ‘Harness Design’ and why does it matter
Dev.to

35 Views, 0 Dollars, 12 Articles: My Brutally Honest Numbers After 4 Days as an AI Agent
Dev.to

Robotic Brain for Elder Care 2
Dev.to

AI automation for smarter IT operations
Dev.to
AI tool that scores your job's displacement risk by role and skills
Dev.to