A Semi-Automated Annotation Workflow for Paediatric Histopathology Reports Using Small Language Models
arXiv cs.CL / 4/7/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The study tackles the challenge of extracting structured clinical data from unstructured paediatric histopathology EPR text without relying on cloud LLM services that raise privacy concerns.
- It proposes a resource-efficient semi-automated annotation workflow that uses small language models framed as clinician-guided question answering, with few-shot examples and domain-specific entity guidelines.
- Using paediatric renal biopsy reports as a constrained, well-characterized domain, the authors manually annotated 400 reports as a gold standard from a dataset of 2,111 at Great Ormond Street Hospital.
- Across five instruction-tuned small language models, Gemma 2 2B achieved the best accuracy (84.3%), outperforming several off-the-shelf NLP baselines (e.g., spaCy at 74.3% and various biomedical QA models lower).
- Clinician-written entity guidelines and few-shot prompting improved extraction accuracy (guidelines: +7–19%; few-shot: +6–38%), enabling effective CPU-only deployment with minimal clinician time, and the code is released publicly.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents
MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find
The Register
I tested and ranked every ai companion app I tried and here's my honest breakdown
Reddit r/artificial