EVIL: Evolving Interpretable Algorithms for Zero-Shot Inference on Event Sequences and Time Series with LLMs

arXiv cs.LG / 4/20/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces EVIL, an LLM-guided evolutionary search method that evolves simple, fully interpretable Python/NumPy programs for dynamical systems inference without training neural networks on large datasets.
  • EVIL performs zero-shot, in-context inference across multiple datasets by evolving a single compact inference function that generalizes across evaluation sets.
  • The approach is evaluated on three time- and event-sequence tasks—next-event prediction for temporal point processes, rate matrix estimation for Markov jump processes, and time-series imputation.
  • Results indicate the evolved algorithms are often competitive with or outperform state-of-the-art deep learning models while being orders of magnitude faster and maintaining full interpretability.
  • The work claims a first-of-its-kind demonstration of LLM-guided program evolution producing one unified inference function across these dynamical-systems problems.

Abstract

We introduce EVIL (\textbf{EV}olving \textbf{I}nterpretable algorithms with \textbf{L}LMs), an approach that uses LLM-guided evolutionary search to discover simple, interpretable algorithms for dynamical systems inference. Rather than training neural networks on large datasets, EVIL evolves pure Python/NumPy programs that perform zero-shot, in-context inference across datasets. We apply EVIL to three distinct tasks: next-event prediction in temporal point processes, rate matrix estimation for Markov jump processes, and time series imputation. In each case, a single evolved algorithm generalizes across all evaluation datasets without per-dataset training (analogous to an amortized inference model). To the best of our knowledge, this is the first work to show that LLM-guided program evolution can discover a single compact inference function for these dynamical-systems problems. Across the three domains, the discovered algorithms are often competitive with, and even outperform, state-of-the-art deep learning models while being orders of magnitudes faster, and remaining fully interpretable.