PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

arXiv cs.CL / 4/21/2026

📰 NewsModels & Research

共有:

Key Points

The paper proposes PRISM, a controlled hallucination benchmark that pinpoints where hallucinations arise in an LLM’s generation pipeline rather than only scoring output-level severity.
It decomposes hallucinations into four diagnostic dimensions—missing knowledge, knowledge errors, reasoning errors, and instruction-following errors—across three generation stages (memory, instruction, reasoning).
PRISM includes 9,448 instances spanning 65 tasks and enables fine-grained, stage-aware evaluation for more actionable debugging of model behavior.
Tests on 24 mainstream open-source and proprietary LLMs reveal recurring trade-offs, where mitigation methods that improve one dimension (e.g., instruction following) can worsen others (e.g., memory retrieval or reasoning).

Abstract

As large language models (LLMs) evolve from conversational assistants into agents capable of handling complex tasks, they are increasingly deployed in high-risk domains. However, existing benchmarks largely rely on mixed queries and posterior evaluation, output-level scoring, which quantifies hallucination severity but offers limited insight into where and why hallucinations arise in the generation pipeline. We therefore reformulate hallucination evaluation as a diagnostic problem and propose PRISM, a controlled benchmark that disentangles hallucinations into four dimensions: knowledge missing, knowledge errors, reasoning errors, and instruction-following errors, grounded in three stages of generation (memory, instruction, and reasoning). PRISM contains 9,448 instances across 65 tasks and supports fine-grained, stage-aware diagnostic evaluation. Evaluating 24 mainstream open-source and proprietary LLMs, we uncover consistent trade-offs across instruction following, memory retrieval, and logical reasoning, showing that mitigation strategies often improve specific dimensions at the expense of others. We hope PRISM provides a framework for understanding the specific mechanisms behind LLMs hallucinations, ultimately accelerating the development of trustworthy large language models.

💡 Insights using this article

This article is featured in our daily AI news digest — key takeaways and action items at a glance.

📅 4/21DailyView insight →

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

Dev.to

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Dev.to

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)

Dev.to

Production LLM systematically violates tool schema constraints to invent UI features; observed over ~2,400 messages [D]

Reddit r/MachineLearning

My AI system kept randomly switching to French mid-answer and it took me way too long to figure out why

Reddit r/artificial

PRISM: Probing Reasoning, Instruction, and Source Memory in LLM Hallucinations

Key Points

Abstract

💡 Insights using this article

Related Articles

Agent Package Manager (APM): A DevOps Guide to Reproducible AI Agents

3 Things I Learned Benchmarking Claude, GPT-4o, and Gemini on Real Dev Work

Open Source Contributors Needed for Skillware & Rooms (AI/ML/Python)

Production LLM systematically violates tool schema constraints to invent UI features; observed over ~2,400 messages [D]

My AI system kept randomly switching to French mid-answer and it took me way too long to figure out why

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer