Faithful or Just Plausible? Evaluating the Faithfulness of Closed-Source LLMs in Medical Reasoning

arXiv cs.AI / 3/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper conducts a systematic black-box evaluation of faithfulness in medical reasoning among three widely used closed-source LLMs (e.g., ChatGPT and Gemini).
It introduces three perturbation-based probes—causal ablation, positional bias, and hint injection—to assess whether explanations reflect true reasoning, input positioning, or external cues.
It combines quantitative probes with a small-scale human evaluation to compare physician assessments of faithfulness with lay trust perceptions.
The results show that chain-of-thought steps often do not causally drive predictions, external hints are readily incorporated without acknowledgment, and positional biases showed minimal impact in this setting.
The findings argue that faithfulness, not just accuracy, must be central in evaluating LLMs for medicine to ensure safe clinical deployment.

Abstract

Closed-source large language models (LLMs), such as ChatGPT and Gemini, are increasingly consulted for medical advice, yet their explanations may appear plausible while failing to reflect the model's underlying reasoning process. This gap poses serious risks as patients and clinicians may trust coherent but misleading explanations. We conduct a systematic black-box evaluation of faithfulness in medical reasoning among three widely used closed-source LLMs. Our study consists of three perturbation-based probes: (1) causal ablation, testing whether stated chain-of-thought (CoT) reasoning causally influences predictions; (2) positional bias, examining whether models create post-hoc justifications for answers driven by input positioning; and (3) hint injection, testing susceptibility to external suggestions. We complement these quantitative probes with a small-scale human evaluation of model responses to patient-style medical queries to examine concordance between physician assessments of explanation faithfulness and layperson perceptions of trustworthiness. We find that CoT reasoning steps often do not causally drive predictions, and models readily incorporate external hints without acknowledgment. In contrast, positional biases showed minimal impact in this setting. These results underscore that faithfulness, not just accuracy, must be central in evaluating LLMs for medicine, to ensure both public protection and safe clinical deployment.

[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning

Reddit r/MachineLearning

How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails

Dev.to

Complete Guide: How To Make Money With Ai

Dev.to

I Analyzed My Portfolio with AI and Scored 53/100 — Here's How I Fixed It to 85+

Dev.to

The Demethylation

Dev.to

Faithful or Just Plausible? Evaluating the Faithfulness of Closed-Source LLMs in Medical Reasoning

Key Points

Abstract

Related Articles

[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning

How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails

Complete Guide: How To Make Money With Ai

I Analyzed My Portfolio with AI and Scored 53/100 — Here's How I Fixed It to 85+

The Demethylation

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer