Representation in large language models

arXiv cs.CL / 5/4/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper examines a core theoretical dispute about what mechanisms drive large language model (LLM) behavior: representation-based information processing versus memorization and stochastic lookup.
  • It frames the question as identifying what kind of algorithm LLMs implement, arguing that both representation-based processing and memorization-style behavior likely contribute.
  • The author discusses how the mechanism question has downstream consequences for philosophical and cognitive-science issues, such as whether LLMs could be said to have beliefs, intentions, concepts, knowledge, or understanding.
  • The paper proposes and defends practical techniques to investigate the internal representations in LLMs and use those findings to build explanations grounded in observed structure.
  • The work aims to unblock broader theorizing by providing a foundation and tools for future research on language models and their successors.

Abstract

The extraordinary success of recent Large Language Models (LLMs) on a diverse array of tasks has led to an explosion of scientific and philosophical theorizing aimed at explaining how they do what they do. Unfortunately, disagreement over fundamental theoretical issues has led to stalemate, with entrenched camps of LLM optimists and pessimists often committed to very different views of how these systems work. Overcoming stalemate requires agreement on fundamental questions, and the goal of this paper is to address one such question, namely: is LLM behavior driven partly by representation-based information processing of the sort implicated in biological cognition, or is it driven entirely by processes of memorization and stochastic table look-up? This is a question about what kind of algorithm LLMs implement, and the answer carries serious implications for higher level questions about whether these systems have beliefs, intentions, concepts, knowledge, and understanding. I argue that LLM behavior is partially driven by representation-based information processing, and then I describe and defend a series of practical techniques for investigating these representations and developing explanations on their basis. The resulting account provides a groundwork for future theorizing about language models and their successors.