Neural Structure Embedding for Symbolic Regression via Continuous Structure Search and Coefficient Optimization

arXiv cs.LG / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes SRCO, a unified framework for symbolic regression that replaces discrete, combinatorial structure search with a continuous representation that can be optimized efficiently.
  • SRCO first generates exploratory equations with existing symbolic regression methods, then trains a Transformer to embed symbolic structures into a continuous space suitable for optimization.
  • It performs continuous structure search in the embedding space using gradient-based and/or sampling-based methods to reduce computational cost and improve scalability.
  • After a candidate structure is found, SRCO treats symbolic coefficients as learnable parameters and uses gradient-based coefficient optimization to improve numerical accuracy.
  • Experiments on synthetic and real-world datasets report consistent gains over state-of-the-art approaches in accuracy, robustness, and search efficiency, suggesting a new paradigm linking equation discovery with embedding learning and optimization.

Abstract

Symbolic regression aims to discover human-interpretable equations that explain observational data. However, existing approaches rely heavily on discrete structure search (e.g., genetic programming), which often leads to high computational cost, unstable performance, and limited scalability to large equation spaces. To address these challenges, we propose SRCO, a unified embedding-driven framework for symbolic regression that transforms symbolic structures into a continuous, optimizable representation space. The framework consists of three key components: (1) structure embedding: we first generate a large pool of exploratory equations using traditional symbolic regression algorithms and train a Transformer model to compress symbolic structures into a continuous embedding space; (2) continuous structure search: the embedding space enables efficient exploration using gradient-based or sampling-based optimization, significantly reducing the cost of navigating the combinatorial structure space; and (3) coefficient optimization: for each discovered structure, we treat symbolic coefficients as learnable parameters and apply gradient optimization to obtain accurate numerical values. Experiments on synthetic and real-world datasets show that our approach consistently outperforms state-of-the-art methods in equation accuracy, robustness, and search efficiency. This work introduces a new paradigm for symbolic regression by bridging symbolic equation discovery with continuous embedding learning and optimization.