How Large Language Models Balance Internal Knowledge with User and Document Assertions

arXiv cs.CL / 4/27/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses a key safety problem for LLMs: how they should balance parametric knowledge with simultaneously present user beliefs and retrieved document content in real-world RAG/chat settings.
It introduces a three-source interaction framework and evaluates 27 LLMs across three model families on two datasets to study how models choose between user vs. document assertions.
Results show most models overweight document assertions rather than user assertions, and this tendency becomes stronger after post-training.
The behavioral analysis finds many models are generally “impressionable,” struggling to distinguish helpful from harmful external information.
The authors show that fine-tuning on diverse interaction data from multiple source types can significantly improve the model’s ability to discriminate between helpful and harmful inputs.

Abstract

Large language models (LLMs) often need to balance their internal parametric knowledge with external information, such as user beliefs and content from retrieved documents, in real-world scenarios like RAG or chat-based systems. A model's ability to reliably process these sources is key to system safety. Previous studies on knowledge conflict and sycophancy are limited to a binary conflict paradigm, primarily exploring conflicts between parametric knowledge and either a document or a user, but ignoring the interactive environment where all three sources exist simultaneously. To fill this gap, we propose a three-source interaction framework and systematically evaluate 27 LLMs from 3 families on 2 datasets. Our findings reveal general patterns: most models rely more on document assertions than user assertions, and this preference is reinforced by post-training. Furthermore, our behavioral analysis shows that most models are impressionable, unable to effectively discriminate between helpful and harmful external information. To address this, we demonstrate that fine-tuning on diverse source interaction data can significantly increase a model's discrimination abilities. In short, our work paves the way for developing trustworthy LLMs that can effectively and reliably integrate multiple sources of information. Code is available at https://github.com/shuowl/llm-source-balancing.