Are they human? Detecting large language models by probing human memory constraints

arXiv cs.AI / 4/2/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that online behavioral research validity depends on participants being human, but LLM-based agents can now pass many traditional “are you a human?” challenges.
  • It proposes an alternative detection strategy: look for tasks where LLMs perform “too well” due to violating an established human cognitive limitation.
  • The authors focus on limited working memory capacity and test a serial recall task to model and compare human cognition versus LLM behavior.
  • Results indicate that cognitive modeling on this standard serial recall task can distinguish online participants from LLMs even when the LLMs are instructed to mimic human working-memory constraints.
  • Overall, the study suggests that leveraging well-established cognitive phenomena can be a viable way to detect LLMs in human-subject settings.

Abstract

The validity of online behavioral research relies on study participants being human rather than machine. In the past, it was possible to detect machines by posing simple challenges that were easily solved by humans but not by machines. General-purpose agents based on large language models (LLMs) can now solve many of these challenges, threatening the validity of online behavioral research. Here we explore the idea of detecting humanness by using tasks that machines can solve too well to be human. Specifically, we probe for the existence of an established human cognitive constraint: limited working memory capacity. We show that cognitive modeling on a standard serial recall task can be used to distinguish online participants from LLMs even when the latter are specifically instructed to mimic human working memory constraints. Our results demonstrate that it is viable to use well-established cognitive phenomena to distinguish LLMs from humans.