VERI-DPO: Evidence-Aware Alignment for Clinical Summarization via Claim Verification and Direct Preference Optimization
arXiv cs.CL / 3/12/2026
💬 OpinionModels & Research
Key Points
- VERI-DPO combines claim verification with Direct Preference Optimization to train a summarizer that stays faithful to fragmented EHR evidence using a retrieval-augmented verifier.
- It labels claim-evidence pairs as Supported, Not Supported, or Not Addressed and uses these signals to derive length-controlled, contradiction-anchored preference pairs for learning.
- On held-out ICU patients in MIMIC-III-Ext-VeriFact-BHC, Not Supported rates drop from 10.7% to 1.9% (local verifier) and 11.6% to 6.4% (GPT-4o), and validity rises from 76.7% to 82.5%.
- The approach aims to reduce omissions and unsupported statements in LLM-based clinical summarization, improving reliability without sacrificing informative length.
Related Articles
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
[P] Vibecoded on a home PC: building a ~2700 Elo browser-playable neural chess engine with a Karpathy-inspired AI-assisted research loop
Reddit r/MachineLearning
Meet DuckLLM 1.0 My First Model!
Reddit r/LocalLLaMA
Since FastFlowLM added support for Linux, I decided to benchmark all the models they support, here are some results
Reddit r/LocalLLaMA
What measure do I use to compare nested models and non nested models in high dimensional survival analysis [D]
Reddit r/MachineLearning