Hubble: An LLM-Driven Agentic Framework for Safe and Automated Alpha Factor Discovery

arXiv cs.AI / 4/14/2026

📰 NewsDeveloper Stack & InfrastructureIdeas & Deep AnalysisModels & Research

Key Points

  • Hubble is proposed as an LLM-driven, closed-loop agentic framework for automated discovery of predictive alpha factors in quantitative finance, addressing the large search space and low signal-to-noise ratios.
  • The approach uses an LLM to propose candidates under a domain-specific operator language, then executes them within an AST-based sandbox to enforce deterministic safety constraints and improve interpretability.
  • Candidate factors are scored through a rigorous statistical pipeline, including cross-sectional RankIC, annualized Information Ratio, and portfolio turnover.
  • An evolutionary feedback loop returns top-performing factors and structured error diagnostics to the LLM for iterative refinement across multiple generation rounds.
  • In experiments on 30 U.S. equities over 752 trading days, Hubble evaluated 181 syntactically valid factors from 122 candidates across three rounds, reaching a peak composite score of 0.827 with full computational stability.

Abstract

Discovering predictive alpha factors in quantitative finance remains a formidable challenge due to the vast combinatorial search space and inherently low signal-to-noise ratios in financial data. Existing automated methods, particularly genetic programming, often produce complex, uninterpretable formulas prone to overfitting. We introduce Hubble, a closed-loop factor mining framework that leverages Large Language Models (LLMs) as intelligent search heuristics, constrained by a domain-specific operator language and an Abstract Syntax Tree (AST)-based execution sandbox. The framework evaluates candidate factors through a rigorous statistical pipeline encompassing cross-sectional Rank Information Coefficient (RankIC), annualized Information Ratio, and portfolio turnover. An evolutionary feedback mechanism returns top-performing factors and structured error diagnostics to the LLM, enabling iterative refinement across multiple generation rounds. In experiments conducted on a panel of 30 U.S. equities over 752 trading days, the system evaluated 181 syntactically valid factors from 122 unique candidates across three rounds, achieving a peak composite score of 0.827 with 100% computational stability. Our results demonstrate that combining LLM-driven generation with deterministic safety constraints yields an effective, interpretable, and reproducible approach to automated factor discovery.