Can we automatize scientific discovery in the cognitive sciences?

arXiv cs.AI / 2026/3/24

💬 オピニオンSignals & Early TrendsIdeas & Deep AnalysisModels & Research

要点

  • The paper argues that cognitive science discovery is limited by manual intervention and a narrow hypothesis search driven by researchers’ intuition and backgrounds.
  • It proposes an end-to-end automated “in silico science of the mind” pipeline where LLMs generate experimental paradigms, foundation models simulate behavioral data, and LLMs synthesize cognitive-model code.
  • The workflow closes the loop by optimizing an “interestingness” score assessed by an LLM-critic, enabling iterative, high-throughput theory discovery.
  • The approach is positioned as a scalable engine for surfacing experiments and candidate mechanisms that can later be validated with real human participants.
  • Overall, it reframes cognitive-science theory development as an automated discovery loop akin to computational search over a large space of algorithmic hypotheses.

Abstract

The cognitive sciences aim to understand intelligence by formalizing underlying operations as computational models. Traditionally, this follows a cycle of discovery where researchers develop paradigms, collect data, and test predefined model classes. However, this manual pipeline is fundamentally constrained by the slow pace of human intervention and a search space limited by researchers' background and intuition. Here, we propose a paradigm shift toward a fully automated, in silico science of the mind that implements every stage of the discovery cycle using Large Language Models (LLMs). In this framework, experimental paradigms exploring conceptually meaningful task structures are directly sampled from an LLM. High-fidelity behavioral data are then simulated using foundation models of cognition. The tedious step of handcrafting cognitive models is replaced by LLM-based program synthesis, which performs a high-throughput search over a vast landscape of algorithmic hypotheses. Finally, the discovery loop is closed by optimizing for ''interestingness'', a metric of conceptual yield evaluated by an LLM-critic. By enabling a fast and scalable approach to theory development, this automated loop functions as a high-throughput in-silico discovery engine, surfacing informative experiments and mechanisms for subsequent validation in real human populations.