Instruction Set and Language for Symbolic Regression

arXiv cs.CL / 3/24/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies “structural redundancy” in symbolic regression, where multiple node-numbering schemes for the same expression create redundant candidates in the search space and waste fitness evaluations.
  • It introduces IsalSR, which encodes expression DAGs as strings using a compact two-tier alphabet representation.
  • IsalSR computes a pruned canonical string that is a complete labeled-DAG isomorphism invariant, collapsing equivalent DAG representations into a single canonical form.
  • By enforcing canonicalization, the method aims to reduce duplicate evaluations while preserving meaningful diversity for symbolic regression search processes.

Abstract

A fundamental but largely unaddressed obstacle in Symbolic regression (SR) is structural redundancy: every expression DAG with admits many distinct node-numbering schemes that all encode the same expression, each occupying a separate point in the search space and consuming fitness evaluations without adding diversity. We present IsalSR (Instruction Set and Language for Symbolic Regression), a representation framework that encodes expression DAGs as strings over a compact two-tier alphabet and computes a pruned canonical string -- a complete labeled-DAG isomorphism invariant -- that collapses all the equivalent representations into a single canonical form.