SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy
arXiv cs.CL / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The SemioLLM study evaluates eight large language models on an epilepsy diagnostic-reasoning task that maps seizure-description phrases from unstructured clinical narratives to one of seven seizure onset zones using likelihood estimates.
- Results show that, with prompt engineering and clinician-guided chain-of-thought style reasoning, several models can achieve performance that often matches ground truth and can approach clinician-level accuracy.
- Model performance is strongly influenced by factors including clinical in-context impersonation, narrative length, and language context, producing notable percentage swings across conditions.
- Expert review of reasoning outputs finds that correct predictions can still rely on hallucinated knowledge and inaccurate source citation, highlighting interpretability and reliability gaps for clinical deployment.
- The paper proposes SemioLLM as a scalable, domain-adaptable evaluation framework for clinical settings where diagnostic information is embedded in free-text narratives.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.
Related Articles

Black Hat Asia
AI Business

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register

Paperclip: Công Cụ Miễn Phí Biến AI Thành Đội Phát Triển Phần Mềm
Dev.to
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA