RadLite: Multi-Task LoRA Fine-Tuning of Small Language Models for CPU-Deployable Radiology AI
arXiv cs.AI / 5/4/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The study proposes RadLite, showing that 3–4B parameter small language models can deliver strong multi-task radiology performance by using LoRA fine-tuning rather than relying on resource-heavy LLM deployment.
- Researchers fine-tuned Qwen2.5-3B-Instruct and Qwen3-4B on 162K samples covering nine radiology tasks (including RADS classification, impression generation, NLI/NER, staging, abnormality detection, and radiology Q&A) compiled from 12 public datasets.
- LoRA fine-tuning substantially outperforms zero-shot baselines, with reported gains such as RADS accuracy +53%, NLI +60%, and N-staging +89%.
- The two models provide complementary capabilities (Qwen2.5 better at structured generation, Qwen3 stronger at extractive tasks), and a task-specific oracle ensemble of both yields the best overall results.
- For real-world deployment, the models can be quantized to GGUF (~1.8–2.4GB) enabling CPU-only use at about 4–8 tokens/second on consumer hardware, and the authors find fine-tuned few-shot prompting can reduce performance, suggesting LoRA adaptation works better than in-context learning for this domain.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat USA
AI Business
A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"
Reddit r/LocalLLaMA

ALM on Power Platform: ADO + GitHub, the best of both worlds
Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️
Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?
Dev.to