Faithful or Just Plausible? Evaluating the Faithfulness of Closed-Source LLMs in Medical Reasoning
arXiv cs.AI / 3/17/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper conducts a systematic black-box evaluation of faithfulness in medical reasoning among three widely used closed-source LLMs (e.g., ChatGPT and Gemini).
- It introduces three perturbation-based probes—causal ablation, positional bias, and hint injection—to assess whether explanations reflect true reasoning, input positioning, or external cues.
- It combines quantitative probes with a small-scale human evaluation to compare physician assessments of faithfulness with lay trust perceptions.
- The results show that chain-of-thought steps often do not causally drive predictions, external hints are readily incorporated without acknowledgment, and positional biases showed minimal impact in this setting.
- The findings argue that faithfulness, not just accuracy, must be central in evaluating LLMs for medicine to ensure safe clinical deployment.
Related Articles
[R] Combining Identity Anchors + Permission Hierarchies achieves 100% refusal in abliterated LLMs — system prompt only, no fine-tuning
Reddit r/MachineLearning
How I Built an AI SDR Agent That Finds Leads and Writes Personalized Cold Emails
Dev.to
Complete Guide: How To Make Money With Ai
Dev.to
I Analyzed My Portfolio with AI and Scored 53/100 — Here's How I Fixed It to 85+
Dev.to
The Demethylation
Dev.to