An Auditable AI Agent Loop for Empirical Economics: A Case Study in Forecast Combination

arXiv stat.ML / 3/23/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The study adapts an open-source AI agent-loop architecture to empirical economics and adds a post-search holdout evaluation to improve auditability.
  • In a forecast-combination illustration, multiple independent agent runs surpass standard benchmarks during rolling evaluation but do not all persist on the post-search holdout.
  • Logged search and holdout evaluation together increase transparency of adaptive specification search and help distinguish robust improvements from sample-specific findings.
  • The work demonstrates how auditing mechanisms can curb hidden researcher degrees of freedom when using AI agents in empirical research.

Abstract

AI coding agents make empirical specification search fast and cheap, but they also widen hidden researcher degrees of freedom. Building on an open-source agent-loop architecture, this paper adapts that framework to an empirical economics workflow and adds a post-search holdout evaluation. In a forecast-combination illustration, multiple independent agent runs outperform standard benchmarks in the original rolling evaluation, but not all continue to do so on a post-search holdout. Logged search and holdout evaluation together make adaptive specification search more transparent and help distinguish robust improvements from sample-specific discoveries.