A Proactive EMR Assistant for Doctor-Patient Dialogue: Streaming ASR, Belief Stabilization, and Preliminary Controlled Evaluation

arXiv cs.AI / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that current dialogue-based EMR systems are mostly passive (transcribe → extract → generate notes) and therefore miss key needs for proactive support like streaming ASR noise handling, punctuation recovery, and stable diagnostic belief tracking.
  • It proposes an end-to-end proactive EMR assistant pipeline that combines streaming speech recognition, punctuation restoration, stateful extraction, belief stabilization, objectified retrieval, action planning, and replayable report generation.
  • In a preliminary controlled pilot with ten streamed doctor–patient dialogues plus a 300-query retrieval benchmark, the full system achieves state-event F1 of 0.84 and retrieval Recall@5 of 0.87, along with pilot scores indicating strong coverage and structural completeness.
  • Ablation results indicate that punctuation restoration and belief stabilization likely improve downstream extraction, retrieval, and action selection, supporting the motivation for these components.
  • The authors emphasize these are controlled, simulated pilot results and should not be interpreted as evidence of clinical deployment readiness, safety, or real-world utility.

Abstract

Most dialogue-based electronic medical record (EMR) systems still behave as passive pipelines: transcribe speech, extract information, and generate the final note after the consultation. That design improves documentation efficiency, but it is insufficient for proactive consultation support because it does not explicitly address streaming speech noise, missing punctuation, unstable diagnostic belief, objectification quality, or measurable next-action gains. We present an end-to-end proactive EMR assistant built around streaming speech recognition, punctuation restoration, stateful extraction, belief stabilization, objectified retrieval, action planning, and replayable report generation. The system is evaluated in a preliminary controlled setting using ten streamed doctor-patient dialogues and a 300-query retrieval benchmark aggregated across dialogues. The full system reaches state-event F1 of 0.84, retrieval Recall@5 of 0.87, and end-to-end pilot scores of 83.3% coverage, 81.4% structural completeness, and 80.0% risk recall. Ablations further suggest that punctuation restoration and belief stabilization may improve downstream extraction, retrieval, and action selection within this pilot. These results were obtained under a controlled simulated pilot setting rather than broad deployment claims, and they should not be read as evidence of clinical deployment readiness, clinical safety, or real-world clinical utility. Instead, they suggest that the proposed online architecture may be technically coherent and directionally supportive under tightly controlled pilot conditions. The present study should be read as a pilot concept demonstration under tightly controlled pilot conditions rather than as evidence of clinical deployment readiness or clinical generalizability.