Optimal Question Selection from a Large Question Bank for Clinical Field Recovery in Conversational Psychiatric Intake

arXiv cs.AI / 4/27/2026

💬 OpinionDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper frames psychiatric intake as a high-stakes sequential decision problem, where clinicians (or systems) must choose which clinically grounded questions to ask, in what order, and how to handle ambiguous or incomplete patient responses under time constraints.
It introduces a dedicated benchmark built from 655 clinician-authored intake questions paired with synthetic patient vignettes covering five behavioral conditions, enabling controlled evaluation of conversational “field recovery” performance.
In experiments across 300 simulated interview sessions, a fixed clinically ordered intake form significantly beats random questioning, while an LLM-guided adaptive question-selection policy achieves the best overall recovery.
The LLM-guided policy’s gains are especially large when patients behave in ways that are harder to recover information from, with the biggest improvement occurring under guarded-and-concise responses.
The results emphasize that conversational clinical performance depends not only on language understanding but also on topic discovery and adherence to the right clinical structure within a limited interaction budget.

Abstract

Psychiatric intake is a sequential, high-stakes information-gathering process in which clinicians must decide what to ask, in what order, and how to interpret incomplete or ambiguous responses under limited time. Despite growing interest in conversational AI for healthcare, there is still limited infrastructure for conversational AI in this application. Accordingly, we formulate this task as a question-selection problem with clinically grounded questions, known target information, and controllable patient difficulty. We also introduce a task-specific question-selection benchmark based on a bank of 655 clinician-authored intake questions and corresponding synthetic patient vignettes with 5 different behavioral conditions. In our evaluation, we compare random questioning, a clinical psychiatric intake form baseline, and an LLM-guided adaptive policy across 300 interview sessions spanning four patients and five behavioral conditions. Across the benchmark, the clinically ordered fixed form substantially outperforms random questioning, and the LLM-guided policy achieves the strongest overall recovery. The advantage of adaptation grows sharply under patient behavior that is less amenable to field recovery, especially under guarded-concise conditions. These findings suggest that performance in conversational clinical systems depends not only on language understanding after information is disclosed, but also on whether the system reaches the right topics within a limited interaction budget. More broadly, the benchmark provides a controlled framework for studying how clinical structure and adaptive follow-up contribute to information recovery in interactive clinical machine learning.