Large Language Models for Biomedical Article Classification

arXiv cs.CL / 3/13/2026

📰 NewsTools & Practical UsageModels & Research

共有:

Key Points

The study systematically evaluates large language models as text classifiers for biomedical article classification, comparing small and mid-size open-source models as well as selected closed-source models across prompts, output processing, few-shot example counts, and selection methods.
Across 15 challenging datasets, zero-shot prompting achieves average PR AUC above 0.4 and few-shot prompting around 0.5, approaching the performance of Naive Bayes, random forests, and fine-tuned transformer baselines.
The results indicate that using output token probabilities for class probability prediction is a particularly promising setup.
The work provides practical recommendations and broadens prior work by evaluating a wider range of configurations.

Abstract

This work presents a systematic and in-depth investigation of the utility of large language models as text classifiers for biomedical article classification. The study uses several small and mid-size open source models, as well as selected closed source ones, and is more comprehensive than most prior work with respect to the scope of evaluated configurations: different types of prompts, output processing methods for generating both class and class probability predictions, as well as few-shot example counts and selection methods. The performance of the most successful configurations is compared to that of conventional classification algorithms. The obtained average PR AUC over 15 challenging datasets above 0.4 for zero-shot prompting and nearly 0.5 for few-shot prompting comes close to that of the na\"ive Bayes classifier (0.5), the random forest algorithm (0.5 with default settings or 0.55 with hyperparameter tuning) and fine-tuned transformer models (0.5). These results confirm the utility of large language models as text classifiers for non-trivial domains and provide practical recommendations of the most promising setups, including in particular using output token probabilities for class probability prediction.

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Dev.to

Interesting loop

Reddit r/LocalLLaMA

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Reddit r/LocalLLaMA

Die besten AI Tools fuer Digital Nomads 2026

Dev.to

I Built the Most Feature-Complete MCP Server for Obsidian — Here's How

Dev.to

Large Language Models for Biomedical Article Classification

Key Points

Abstract

Related Articles

I Was Wrong About AI Coding Assistants. Here's What Changed My Mind (and What I Built About It).

Interesting loop

Qwen3.5-122B-A10B Uncensored (Aggressive) — GGUF Release + new K_P Quants

Die besten AI Tools fuer Digital Nomads 2026

I Built the Most Feature-Complete MCP Server for Obsidian — Here's How

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer