Leveraging Large Language Models to Extract and Translate Medical Information in Doctors' Notes for Health Records and Diagnostic Billing Codes
arXiv cs.CL / 3/25/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The thesis proposes a privacy-preserving, on-device offline system that uses open-weight LLMs to extract medical facts from doctors’ notes and map them to ICD-10-CM diagnostic billing codes without cloud services.
- It evaluates multiple local open-weight models (e.g., Llama 3.2, Mistral, Phi, DeepSeek) on consumer-grade hardware using Ollama, LangChain, and containerized deployment, along with a synthetic medical-note benchmark.
- Enforcing a strict JSON output schema yields near-100% formatting compliance, but generating the correct, specific diagnostic codes remains difficult—especially for smaller 7B–20B parameter models.
- The work finds that few-shot prompting can worsen results due to overfitting and hallucinations, while retrieval-augmented generation helps with discovering unseen codes but often suffers from context-window saturation.
- The authors conclude that fully automated unsupervised coding with local open-source models is not yet dependable and recommend a human-in-the-loop workflow, while contributing a reproducible local LLM pipeline and benchmark dataset.
Related Articles

Black Hat Asia
AI Business

"The Agent Didn't Decide Wrong. The Instructions Were Conflicting — and Nobody Noticed."
Dev.to
Top 5 LLM Gateway Alternatives After the LiteLLM Supply Chain Attack
Dev.to

Stop Counting Prompts — Start Reflecting on AI Fluency
Dev.to

Reliable Function Calling in Deeply Recursive Union Types: Fixing Qwen Models' Double-Stringify Bug
Dev.to