Differentiable Symbolic Planning: A Neural Architecture for Constraint Reasoning with Learned Feasibility

arXiv cs.AI / 4/6/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes Differentiable Symbolic Planning (DSP), a neural architecture designed to handle logical/physical constraint reasoning while staying fully differentiable.
  • DSP introduces a per-node feasibility channel (phi) and a learned global feasibility aggregator (Phi) to track and combine evidence of constraint satisfaction during symbolic reasoning.
  • Using sparsemax attention enables exact-zero discrete rule selection, helping DSP bridge discrete symbolic steps with gradient-based learning.
  • When DSP is integrated into a Universal Cognitive Kernel (UCK), the system shows strong benchmark results across graph reachability, Boolean satisfiability, and planning feasibility with substantial generalization gains.
  • Ablations indicate that removing global Phi aggregation sharply degrades performance, and the learned feasibility signal (phi) develops interpretable values for feasible vs. infeasible cases without supervision.

Abstract

Neural networks excel at pattern recognition but struggle with constraint reasoning -- determining whether configurations satisfy logical or physical constraints. We introduce Differentiable Symbolic Planning (DSP), a neural architecture that performs discrete symbolic reasoning while remaining fully differentiable. DSP maintains a feasibility channel (phi) that tracks constraint satisfaction evidence at each node, aggregates this into a global feasibility signal (Phi) through learned rule-weighted combination, and uses sparsemax attention to achieve exact-zero discrete rule selection. We integrate DSP into a Universal Cognitive Kernel (UCK) that combines graph attention with iterative constraint propagation. Evaluated on three constraint reasoning benchmarks -- graph reachability, Boolean satisfiability, and planning feasibility -- UCK+DSP achieves 97.4% accuracy on planning under 4x size generalization (vs. 59.7% for ablated baselines), 96.4% on SAT under 2x generalization, and maintains balanced performance on both positive and negative classes where standard neural approaches collapse. Ablation studies reveal that global phi aggregation is critical: removing it causes accuracy to drop from 98% to 64%. The learned phi signal exhibits interpretable semantics, with values of +18 for feasible cases and -13 for infeasible cases emerging without supervision.