AI-Generated Prior Authorization Letters: Strong Clinical Content, Weak Administrative Scaffolding

arXiv cs.AI / 4/1/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper evaluates three commercial LLMs (GPT-4o, Claude Sonnet 4.5, and Gemini 2.5 Pro) on 45 physician-validated synthetic prior authorization scenarios across multiple specialties and finds they can generate clinically strong letters.
  • Across models, the letters tend to include accurate diagnoses, well-formed medical necessity narratives, and clear step-therapy documentation.
  • A separate analysis against real-world payer administrative requirements shows systematic omissions that clinical quality scoring misses, such as missing billing codes, unspecified authorization durations, and incomplete follow-up plans.
  • The authors argue that the primary barrier to clinical deployment is not LLM clinical writing capability but the surrounding systems’ ability to deliver payer-specific administrative precision.
  • The study moves beyond single-case demonstrations by using structured, multi-scenario evaluation to better characterize what “submission-ready” prior authorization support requires.

Abstract

Prior authorization remains one of the most burdensome administrative processes in U.S. healthcare, consuming billions of dollars and thousands of physician hours each year. While large language models have shown promise across clinical text tasks, their ability to produce submission-ready prior authorization letters has received only limited attention, with existing work confined to single-case demonstrations rather than structured multi-scenario evaluation. We assessed three commercially available LLMs (GPT-4o, Claude Sonnet 4.5, and Gemini 2.5 Pro) across 45 physician-validated synthetic scenarios spanning rheumatology, psychiatry, oncology, cardiology, and orthopedics. All three models generated letters with strong clinical content: accurate diagnoses, well-structured medical necessity arguments, and thorough step therapy documentation. However, a secondary analysis of real-world administrative requirements revealed consistent gaps that clinical scoring alone did not capture, including absent billing codes, missing authorization duration requests, and inadequate follow-up plans. These findings reframe the question: the challenge for clinical deployment is not whether LLMs can write clinically adequate letters, but whether the systems built around them can supply the administrative precision that payer workflows require.