InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI

arXiv cs.AI / 4/7/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • InferenceEvolve is proposed as an evolutionary framework that uses large language models to automatically discover and iteratively refine causal effect estimators.
  • Experiments on widely used benchmarks show the evolved estimators outperform existing baselines, including performance relative to 58 human submissions in a recent community competition.
  • The best evolved estimator is reported to be on the Pareto frontier across two evaluation metrics, indicating a favorable trade-off between competing criteria.
  • The work introduces robust proxy objectives for cases where semi-synthetic outcomes are unavailable, achieving competitive results in partially observed settings.
  • Trajectory analysis suggests the LLM-guided evolutionary agents gradually learn increasingly sophisticated, data-generating-mechanism-specific strategies.

Abstract

Causal inference is central to scientific discovery, yet choosing appropriate methods remains challenging because of the complexity of both statistical methodology and real-world data. Inspired by the success of artificial intelligence in accelerating scientific discovery, we introduce InferenceEvolve, an evolutionary framework that uses large language models to discover and iteratively refine causal methods. Across widely used benchmarks, InferenceEvolve yields estimators that consistently outperform established baselines: against 58 human submissions in a recent community competition, our best evolved estimator lay on the Pareto frontier across two evaluation metrics. We also developed robust proxy objectives for settings without semi-synthetic outcomes, with competitive results. Analysis of the evolutionary trajectories shows that agents progressively discover sophisticated strategies tailored to unrevealed data-generating mechanisms. These findings suggest that language-model-guided evolution can optimize structured scientific programs such as causal inference, even when outcomes are only partially observed.