CARE: Privacy-Compliant Agentic Reasoning with Evidence Discordance

arXiv cs.CL / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper studies how LLM systems degrade in high-stakes settings when evidence is internally inconsistent, using healthcare cases where patient symptoms contradict medical signs.
It introduces MIMIC-DOS, a new ICU short-horizon organ dysfunction worsening prediction dataset derived from MIMIC-IV, curated specifically for sign-symptom discordance.
The authors propose CARE, a multi-stage privacy-compliant agentic reasoning framework that separates roles: a remote LLM generates structured reasoning scaffolds without seeing sensitive patient data, while a local LLM uses them for evidence acquisition and final decisions.
Experiments indicate CARE outperforms several baseline approaches (including single-pass LLMs and other agentic pipelines) across key metrics, showing improved robustness to conflicting clinical evidence while maintaining privacy constraints.

Abstract

Large language model (LLM) systems are increasingly used to support high-stakes decision-making, but they typically perform worse when the available evidence is internally inconsistent. Such a scenario exists in real-world healthcare settings, with patient-reported symptoms contradicting medical signs. To study this problem, we introduce MIMIC-DOS, a dataset for short-horizon organ dysfunction worsening prediction in the intensive care unit (ICU) setting. We derive this dataset from the widely recognized MIMIC-IV, a publicly available electronic health record dataset, and construct it exclusively from cases in which discordance between signs and symptoms exists. This setting poses a substantial challenge for existing LLM-based approaches, with single-pass LLMs and agentic pipelines often struggling to reconcile such conflicting signals. To address this problem, we propose CARE: a multi-stage privacy-compliant agentic reasoning framework in which a remote LLM provides guidance by generating structured categories and transitions without accessing sensitive patient data, while a local LLM uses these categories and transitions to support evidence acquisition and final decision-making. Empirically, CARE achieves stronger performance across all key metrics compared to multiple baseline settings, showing that CARE can more robustly handle conflicting clinical evidence while preserving privacy.

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

MarkTechPost

The house asked me a question

Dev.to

Precision Clip Selection: How AI Suggests Your In and Out Points

Dev.to

CARE: Privacy-Compliant Agentic Reasoning with Evidence Discordance

Key Points

Abstract

Related Articles

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts

The house asked me a question

Precision Clip Selection: How AI Suggests Your In and Out Points

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer