A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning

arXiv cs.RO / 4/17/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper develops a nonasymptotic theory explaining how independent sub-Gaussian action errors in behavior cloning propagate through PD-controller-based closed-loop dynamics into sub-Gaussian position errors.
  • It introduces a gain-dependent proxy matrix X∞(K) that determines the tail probability of failure over a horizon T, showing that training loss alone cannot reliably predict closed-loop performance.
  • The failure probability is factorized into a gain-dependent amplification index Γ_T(K) and a term combining validation loss and generalization slack, making controller gains central to performance evaluation.
  • Under structural (shape-preserving) assumptions, the authors derive bounds on X∞(K) and identify how different PD regimes (e.g., compliant-overdamped vs stiff-underdamped) change the tightness of the failure bounds in a system-dependent way.
  • For a canonical scalar second-order PD system, the stationary variance has a closed-form expression and is shown to be strictly monotone in stiffness and damping, and this monotonicity carries over to ZOH discretization, matching the empirical advantage of compliant, overdamped controllers for BC success.

Abstract

Behavior cloning (BC) policies on position-controlled robots inherit the closed-loop response of the underlying PD controller, yet the effect of controller gains on BC failure lacks a nonasymptotic theory. We show that independent sub-Gaussian action errors propagate through the gain-dependent closed-loop dynamics to yield sub-Gaussian position errors whose proxy matrix X_\infty(K) governs the failure tail. The probability of horizon-T task failure factorizes into a gain-dependent amplification index \Gamma_T(K) and the validation loss plus a generalization slack, so training loss alone cannot predict closed-loop performance. Under shape-preserving upper-bound structural assumptions the proxy admits the scalar bound X_\infty(K)\preceq\Psi(K)\bar X with \Psi(K) decomposed into label difficulty, injection strength, and contraction, ranking the four canonical regimes with compliant-overdamped (CO) tightest, stiff-underdamped (SU) loosest, and the stiff-overdamped versus compliant-underdamped ordering system-dependent. For the canonical scalar second-order PD system the closed-form continuous-time stationary variance X_\infty^{\mathrm{c}}(\alpha,\beta)=\sigma^2\alpha/(2\beta) is strictly monotone in stiffness and damping over the entire stable orthant, covering both underdamped and overdamped regimes, and the exact zero-order-hold (ZOH) discretization inherits this monotonicity. The analysis provides the first nonasymptotic explanation of the empirical finding that compliant, overdamped controllers improve BC success rates.