Optimization-Free Topological Sort for Causal Discovery via the Schur Complement of Score Jacobians

arXiv cs.LG / 4/29/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that continuous causal discovery can be decoupled from non-convex acyclicity-penalty optimization, which often leads to local optima and scalability limits.
  • It introduces the Score-Schur Topological Sort (SSTS), which recovers a topological order directly from unconstrained generative models without requiring constrained structural optimization.
  • The authors show that, under linear assumptions, iterative graph marginalization is mathematically equivalent to computing the Schur complement of the Score-Jacobian Information Matrix (SJIM), turning the acyclicity constraint into an algebraic step.
  • The resulting dominant computational cost is O(d^3), and for non-linear systems the method is extended via Block-SSTS to reduce extraction depth while controlling structural error.
  • Experiments indicate SSTS can analyze non-linear causal graphs up to d=1000, suggesting that once the optimization hurdle is bypassed, performance is mainly limited by finite-sample estimation variance in the learned score geometry.

Abstract

Continuous causal discovery typically couples representation learning with structural optimization via non-convex acyclicity penalties, which subjects solvers to local optima and restricts scalability in high-dimensional regimes. We propose a decoupled paradigm that shifts the causal discovery bottleneck from non-convex optimization to statistical score estimation. We introduce the Score-Schur Topological Sort (SSTS), an algorithm that extracts topological order directly from unconstrained generative models, bypassing constrained structure optimization. We establish that the causal hierarchy leaves a geometric signature within the score function: iterative graph marginalization is mathematically equivalent to computing the Schur complement of the Score-Jacobian Information Matrix (SJIM) under linear conditions. This translates the acyclicity constraint into an algebraic procedure with a dominant cost of O(d^3) operations. For non-linear systems, we formulate the expectation gap of Schur marginalization and introduce Block-SSTS to compress extraction depth, bounding structural error. Empirically, SSTS allows causal structural analysis on non-linear graphs up to d=1000. At this scale, our framework indicates that once the non-convex optimization bottleneck is mathematically bypassed, the structural fidelity of continuous causal discovery is bounded by the finite-sample estimation variance of the global score geometry. By reducing graph extraction to matrix operations, this work reframes scalable causal discovery from a constrained optimization problem to a statistical estimation challenge.