In-Context Examples Suppress Scientific Knowledge Recall in LLMs

arXiv cs.AI / 5/1/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The study finds that adding in-context examples can suppress LLMs’ ability to recall and use scientific knowledge during latent-structure recovery tasks.
  • Even when the in-context examples are generated from the same underlying formulas the model was pretrained on, the model shifts computation toward empirical pattern fitting rather than knowledge-driven derivation.
  • Across 60 tasks in five scientific domains, 6,000 trials, and four different models, the “knowledge displacement” effect is consistent in direction.
  • The impact on accuracy varies depending on how the displaced (knowledge-based) strategy compares to the replacement (example-based) strategy, which can worsen, not change, or sometimes appear to improve results.
  • For practitioners using LLMs in scientific settings, the work suggests a cautionary approach: in-context examples may undermine the very domain knowledge they are meant to reinforce.

Abstract

Scientific reasoning rarely stops at what is directly observable; it often requires uncovering hidden structure from data. From estimating reaction constants in chemistry to inferring demand elasticities in economics, this latent structure recovery is what distinguishes scientific reasoning from curve fitting. Large language models (LLMs) can often recall and apply relevant scientific formulas, but we show that this ability is surprisingly easy to suppress. We show that adding in-context examples makes models rely less on pretrained domain knowledge, even when those examples are generated by the very same formula. Rather than reinforcing knowledge-driven derivation, examples shift computation toward empirical pattern fitting. We document this knowledge displacement on 60 latent structure recovery tasks across five scientific domains, 6,000 trials, and four models. This displacement is consistent across domains, but its accuracy consequences depend on how the displaced strategy compares to the one that replaces it: the same shift can lower accuracy, leave it unchanged, or appear to improve it. In all cases, however, the model shifts away from knowledge-driven reasoning. For practitioners deploying LLMs on scientific tasks, the message is cautionary: in-context examples may displace, rather than reinforce, the knowledge they are intended to support.