A doctor sees 20–30 patients a day.
After each visit, they spend 15–20 minutes typing notes into an EHR system — not treating patients, just filling forms. By the end of the week, that’s 8+ hours of pure documentation overhead.
This is not a workflow problem. It’s a crisis. Physician burnout is real, and the paperwork is a massive driver. At Prolifics, we work with healthcare clients modernizing their infrastructure, and this problem kept coming up — every single time.
So we built a pipeline to fix it: an ambient AI scribe that listens to the doctor-patient conversation and auto-generates a structured clinical note directly into the EHR.
Here’s exactly how we did it — architecture, tools, tradeoffs, and results.
The Real Problem (It’s Not Just “Too Much Typing”)
Before we touch any code, let’s be clear about what’s actually broken.
EHR systems like Epic and Cerner are powerful but clinician-unfriendly.
They’re built for compliance, not usability. A doctor has to manually:
•Select the right SOAP note template
•Type symptoms, history, assessment, and plan
•Attach ICD-10 codes
•Sign off before moving to the next patient
Meanwhile the patient is sitting there watching them stare at a screen.
“I went to medical school to help people — not to become a data entry clerk.”
— A physician at one of our client health systems
This quote stuck with us. It’s the problem statement in one sentence.
The Architecture: Four Layers You Need to Get Right
A production ambient documentation system isn’t just “speech-to-text + GPT.” It’s a pipeline with four distinct technical layers:
[Microphone Input]
↓
[ASR — Automatic Speech Recognition]
↓
[Speaker Diarization — Who Said What]
↓
[Clinical NLP + Named Entity Recognition]
↓
[LLM Summarization → SOAP Note]
↓
[FHIR API → EHR (Epic / Cerner)]
Let’s walk through each one.
Layer 1: ASR — Automatic Speech Recognition
We evaluated three options:
Tool Accuracy HIPAA BAA Latency
Whisper (OpenAI) Very High No (self-hosted only) Medium
Azure Speech High Yes (with config) Low
AWS Transcribe Medical High Yes (native) Low
We went with AWS Transcribe Medical for production because it’s purpose-built for clinical vocabulary — it handles terms like “metformin,” “ejection fraction,” and “CABG” without custom vocabulary tuning.
import boto3
transcribe = boto3.client('transcribe', region_name='us-east-1')
def start_transcription(audio_s3_uri: str, job_name: str):
transcribe.start_medical_transcription_job(
MedicalTranscriptionJobName=job_name,
Media={'MediaFileUri': audio_s3_uri},
MediaFormat='mp4',
LanguageCode='en-US',
Specialty='PRIMARYCARE',
Type='DICTATION',
OutputBucketName='your-hipaa-bucket'
)
For real-time use cases (in-room), we streamed audio directly using the AWS Transcribe Medical streaming API via websockets — latency under 300ms in testing.
Layer 2: Speaker Diarization
Raw transcript is useless without knowing who said what. The doctor and patient speak differently, and mixing their words into a single block breaks downstream NLP.
We used pyannote.audio (open-source, self-hosted) for speaker segmentation:
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization")
diarization = pipeline("encounter_audio.wav")
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"{speaker}: [{turn.start:.1f}s → {turn.end:.1f}s]")
In practice, we labeled Speaker_00 as “Clinician” and Speaker_01 as “Patient” based on the first 10 seconds of audio (doctors always open the conversation).
Layer 3: Clinical NLP + Named Entity Recognition
This is where generic pipelines fall apart. General-purpose LLMs hallucinate dosages, misattribute symptoms, and miss negations (“no chest pain” becomes “chest pain” — a dangerous error).
We ran scispaCy (clinical NLP library) as a pre-processing filter to extract:
•Symptoms (UMLS entity linking)
•Medications + dosages
•Diagnoses (ICD-10 candidate mapping)
•Negations (critical for clinical accuracy)
import scispacy
import spacy
nlp = spacy.load("en_core_sci_lg")
doc = nlp("Patient denies chest pain. Currently on 500mg metformin twice daily.")
for ent in doc.ents:
print(ent.text, ent.label_)
Output:
chest pain — DISEASE
metformin — CHEMICAL
500mg — DOSAGE
We fed this structured output into the LLM prompt — not the raw transcript. This dramatically reduced hallucination.
Layer 4: LLM Summarization → SOAP Note
With structured NER output + diarized transcript, we prompted Claude (via Anthropic API) to generate the clinical note:
system_prompt = """
You are a clinical documentation assistant.
Generate a structured SOAP note from the encounter transcript.
Use only explicitly stated clinical facts.
Never infer diagnoses not mentioned.
Flag any low-confidence fields with [REVIEW NEEDED].
Output in JSON matching HL7 FHIR DocumentReference schema.
"""
user_prompt = f"""
Clinician: {clinician_turns}
Patient: {patient_turns}
Extracted Entities: {ner_output}
Generate SOAP note.
"""
The [REVIEW NEEDED] flag was a non-negotiable requirement from our clinical stakeholders. Doctors need to trust the output before they sign off — a confidence signal is better than silent errors.
Layer 5: FHIR API → EHR Integration
Generated notes go into Epic via the SMART on FHIR API as a DocumentReference resource:
import requests
fhir_note = {
"resourceType": "DocumentReference",
"status": "current",
"type": {
"coding": [{
"system": "http://loinc.org",
"code": "11506-3",
"display": "Progress note"
}]
},
"content": [{
"attachment": {
"contentType": "text/plain",
"data": base64.b64encode(soap_note.encode()).decode()
}
}]
}
response = requests.post(
f"{FHIR_BASE_URL}/DocumentReference",
json=fhir_note,
headers={"Authorization": f"Bearer {access_token}"}
)
One gotcha: Epic’s sandbox FHIR server is strict about LOINC code correctness. Wrong code = silent 422 error. Always validate against the LOINC database first.
HIPAA: The Layer You Can’t Skip
Every component needs a Business Associate Agreement (BAA):
•AWS Transcribe Medical — Native BAA
•S3 buckets — Must be encrypted at rest (AES-256) and in transit (TLS 1.2+)
•LLM API calls — Use private endpoints or self-hosted models
•Audio retention — Define your policy upfront: 30 days QA or 7+ years like a medical record?
We defaulted to deleting raw audio after 72 hours and retaining only the de-identified structured note.
Real Outcomes
We deployed this pipeline at a mid-size primary care group (12 physicians, ~250 encounters/week).
After 60 days:
•Documentation time per encounter: dropped from ~18 min → ~4 min
•Physician burnout score (validated scale): dropped 13 percentage points (consistent with published research)
•Note completion rate same-day: up from 71% → 96%
•Billing code capture accuracy: improved — the NER layer caught previously missed HCC codes
The physicians weren’t just faster. They told us they felt more present with patients again.
Key Takeaways for Developers
1.Don’t use raw transcripts as LLM input. Pre-process with clinical NLP first or you’ll get hallucinations that could harm patients.
2.Speaker diarization is not optional. Without it, attribution errors corrupt the clinical record.
3.The [REVIEW NEEDED] flag saved the project. Clinicians won’t trust black-box outputs. Build in transparency.
4.FHIR is the integration layer, not just an afterthought. Learn SMART on FHIR early — EHR sandbox environments are painful and slow.
5.HIPAA compliance is architecture, not a checkbox. Design your data flows before you write a single line of code.
What’s Next
We’re currently experimenting with:
•Real-time ambient mode — streaming note generation during the encounter, not after
•Specialty-specific models — oncology and cardiology have very different note structures
•Agentic prior auth — using the structured note output to auto-draft insurance prior authorization requests (currently the #1 admin time sink)
At Prolifics, we’ve been building enterprise healthcare integrations for decades. The combination of LLMs + FHIR APIs is genuinely the most exciting shift we’ve seen in clinical workflow tooling — and we’re just getting started.
Let’s Talk
Have you built anything similar? What was the hardest part of your EHR integration — the FHIR API, the HIPAA compliance layer, or getting clinician buy-in?
Drop your experience in the comments. I’d especially love to hear from anyone who’s tackled real-time ambient scribing — the streaming latency challenges are brutal and I suspect others are hitting the same walls we did.



