Large Language Models for Biomedical Article Classification
arXiv cs.CL / 3/13/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The study systematically evaluates large language models as text classifiers for biomedical article classification, comparing small and mid-size open-source models as well as selected closed-source models across prompts, output processing, few-shot example counts, and selection methods.
- Across 15 challenging datasets, zero-shot prompting achieves average PR AUC above 0.4 and few-shot prompting around 0.5, approaching the performance of Naive Bayes, random forests, and fine-tuned transformer baselines.
- The results indicate that using output token probabilities for class probability prediction is a particularly promising setup.
- The work provides practical recommendations and broadens prior work by evaluating a wider range of configurations.
Related Articles

Astral to Join OpenAI
Dev.to

I Built a MITM Proxy to See What Claude Code Actually Sends to Anthropic
Dev.to

Your AI coding agent is installing vulnerable packages. I built the fix.
Dev.to

ChatGPT Prompt Engineering for Freelancers: Unlocking Efficient Client Communication
Dev.to

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA