Sum-of-Checks: Structured Reasoning for Surgical Safety with Large Vision-Language Models
arXiv cs.LG / 4/27/2026
📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research
Key Points
- The paper introduces “Sum-of-Checks,” a framework that breaks the Critical View of Safety (CVS) criteria in laparoscopic cholecystectomy into expert-defined, clinically grounded visual verification checks.
- For each endoscopic frame, large vision-language models (LVLMs) perform binary judgments with justifications for every check, and the framework computes criterion-level scores via fixed, weighted aggregation.
- On the Endoscapes2023 benchmark using three frontier LVLMs, Sum-of-Checks improves average frame-level mean average precision by 12–14% versus the strongest baselines, including direct prompting, chain-of-thought, and sub-question decomposition.
- The authors find LVLMs are more reliable on observational checks (e.g., visibility and tool obstruction) but vary significantly on decision-critical anatomical evidence, highlighting where structured reasoning helps most.
- The study concludes that separating evidence elicitation from decision-making increases both accuracy and auditability for safety-critical surgical AI systems.
💡 Insights using this article
This article is featured in our daily AI news digest — key takeaways and action items at a glance.




