RadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AI

arXiv cs.AI / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The study proposes RadLite, showing that 3–4B parameter small language models can deliver strong multi-task radiology performance by using LoRA fine-tuning rather than relying on resource-heavy LLM deployment.
Researchers fine-tuned Qwen2.5-3B-Instruct and Qwen3-4B on 162K samples covering nine radiology tasks (including RADS classification, impression generation, NLI/NER, staging, abnormality detection, and radiology Q&A) compiled from 12 public datasets.
LoRA fine-tuning substantially outperforms zero-shot baselines, with reported gains such as RADS accuracy +53%, NLI +60%, and N-staging +89%.
The two models provide complementary capabilities (Qwen2.5 better at structured generation, Qwen3 stronger at extractive tasks), and a task-specific oracle ensemble of both yields the best overall results.
For real-world deployment, the models can be quantized to GGUF (~1.8–2.4GB) enabling CPU-only use at about 4–8 tokens/second on consumer hardware, and the authors find fine-tuned few-shot prompting can reduce performance, suggesting LoRA adaptation works better than in-context learning for this domain.

Abstract

Large language models (LLMs) show promise in radiology but their deployment is limited by computational requirements that preclude use in resource-constrained clinical environments. We investigate whether small language models (SLMs) of 3-4 billion parameters can achieve strong multi-task radiology performance through LoRA fine-tuning, enabling deployment on consumer-grade CPUs. We train Qwen2.5-3B-Instruct and Qwen3-4B on 162K samples spanning 9 radiology tasks - RADS classification across 10 systems, impression generation, temporal comparison, radiology NLI, NER, abnormality detection, N/M staging, and radiology Q&A - compiled from 12 public datasets. Both models are evaluated on up to 500 held-out test samples per task with standardized metrics. Our key findings are: (1) LoRA fine-tuning dramatically improves performance over zero-shot baselines (RADS accuracy +53%, NLI +60%, N-staging +89%); (2) the two models exhibit complementary strengths - Qwen2.5 excels at structured generation tasks while Qwen3 dominates extractive tasks; (3) a task-outed oracle ensemble combining both models achieves the best performance across all tasks; (4) few-shot prompting with fine-tuned models hurts performance, demonstrating that LoRA adaptation is more effective than in-context learning for specialized domains; and (5) models can be quantized to GGUF format (~1.8-2.4GB) for CPU deployment at 4-8 tokens/second on consumer hardware. Our work demonstrates that small, efficiently fine-tuned models - which we collectively call RadLite - can serve as practical multi-task radiology AI assistants deployable entirely on consumer hardware without GPU requirements.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 5/4DailyView insight →

Black Hat USA

AI Business

A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"

Reddit r/LocalLLaMA

ALM on Power Platform: ADO + GitHub, the best of both worlds

Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️

Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Dev.to

RadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AI

Key Points

Abstract

💡 Insights using this article

Related Articles

Black Hat USA

A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"

ALM on Power Platform: ADO + GitHub, the best of both worlds

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer