Compiling Deterministic Structure into SLM Harnesses

arXiv cs.AI / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper targets enterprise deployment constraints for small language models (SLMs), where they lack self-correction and frontier models are too costly or limited by data sovereignty.
  • It proposes Semantic Gradient Descent (SGDe), a teacher-student method that compiles agentic workflows into discrete execution plans (DAGs, system prompts, and deterministic code) rather than relying on stochastic training alone.
  • SGDe uses a “frontier teacher” to generate natural-language critiques that act like directional gradients to iteratively refine the SLM’s workflow artifacts in a discrete semantic space.
  • The authors formalize SGDe under a PAC learning framework and claim convergence with as few as three training examples on targeted synthetic tasks by treating the teacher as a statistical prior.
  • Experiments on an adversarially synthesized GSM-Hard-derived benchmark show strong gains over prior prompt optimizers (up to 91.3% at m=5 and 99.3% at m=3), supported by two deterministic structures: capability offloading to a Python runtime and structural consensus via variance-limited reasoning subgraphs.

Abstract

Enterprise deployment of small language models (SLMs) is constrained by epistemic asymmetry: SLMs cannot self-correct reasoning errors, while frontier LLMs are prohibitively costly and face data sovereignty limits for high-volume use. We propose Semantic Gradient Descent (SGDe), a teacher-student framework that compiles agentic workflows into discrete execution plans comprising DAG topologies, system prompts, and deterministic executable code. The trailing "e" distinguishes SGDe from stochastic gradient descent. SGDe operates in a discrete semantic space where a frontier teacher generates natural-language critiques acting as directional gradients to iteratively refine the SLM's workflow artefacts. We formalise SGDe within a PAC learning framework, establishing sample-complexity bounds that enable convergence with as few as three training examples on targeted synthetic tasks by leveraging the teacher as a statistical prior. On a GSM-Hard-derived test set built via adversarial synthesis, compiled workflows reach 91.3% accuracy at m=5 and 99.3% at m=3 within the small-m regime motivated by Corollary 1, a +26.3% to +34.3% absolute improvement over state-of-the-art prompt optimisers. In the emerging paradigm of harness engineering, SGDe treats placement of deterministic code (which subtasks to delegate to a Python runtime versus retain as LLM calls) as a trace-driven, per-node optimisation target, generalising the whole-problem offloading of PAL and PoT. The teacher compiles two complementary deterministic structures: capability offloading, which delegates subtasks to Python when the SLM cannot execute them reliably, and structural consensus, which wraps variance-limited reasoning steps in fan-out/fan-in subgraphs aggregated by deterministic voting.