Evaluating Factor-Wise Auxiliary Dynamics Supervision for Latent Structure and Robustness in Simulated Humanoid Locomotion

arXiv cs.RO / 3/24/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper evaluates DynaMITE, a transformer-based latent dynamics model trained with factor-wise auxiliary losses during PPO for simulated Unitree G1 humanoid locomotion, and finds that the supervised latent does not yield decodable or functionally separable factor structure.
  • Disentanglement and probing results for DynaMITE are near zero (e.g., probe R² ≈ 0; MIG/DCI/SAP near zero), while an unsupervised LSTM hidden state achieves higher factor-probe R² (up to 0.10).
  • A factorial ablation indicates that the auxiliary losses provide no measurable gains in either in-distribution reward or severe out-of-distribution reward, whereas a tanh bottleneck yields a small consistent improvement.
  • Robustness under severe combined perturbations is improved for DynaMITE relative to baselines, but the study attributes this to representation compression from the bottleneck rather than the auxiliary supervision.
  • Across four Isaac Lab humanoid locomotion tasks, LSTM attains the best nominal reward, and the authors conclude auxiliary dynamics supervision is not a reliable route to interpretability or meaningful robustness beyond bottleneck effects.

Abstract

We evaluate whether factor-wise auxiliary dynamics supervision produces useful latent structure or improved robustness in simulated humanoid locomotion. DynaMITE -- a transformer encoder with a factored 24-d latent trained by per-factor auxiliary losses during proximal policy optimization (PPO) -- is compared against Long Short-Term Memory (LSTM), plain Transformer, and Multilayer Perceptron (MLP) baselines on a Unitree G1 humanoid across four Isaac Lab tasks. The supervised latent shows no evidence of decodable or functionally separable factor structure: probe R^2 ~ 0 for all five dynamics factors, clamping any subspace changes reward by < 0.05, and standard disentanglement metrics (MIG, DCI, SAP) are near zero. An unsupervised LSTM hidden state achieves higher probe R^2 (up to 0.10). A 2x2 factorial ablation (n = 10 seeds) isolates the contributions of the tanh bottleneck and auxiliary losses: the auxiliary losses show no measurable effect on either in-distribution (ID) reward (+0.03, p = 0.732) or severe out-of-distribution (OOD) reward (+0.03, p = 0.669), while the bottleneck shows a small, consistent advantage in both regimes (ID: +0.16, p = 0.207; OOD: +0.10, p = 0.208). The bottleneck advantage persists under severe combined perturbation but does not amplify, indicating a training-time representation benefit rather than a robustness mechanism. LSTM achieves the best nominal reward on all four tasks (p < 0.03); DynaMITE degrades less under combined-shift stress (2.3% vs. 16.7%), but this difference is attributable to the bottleneck compression, not the auxiliary supervision. For locomotion practitioners: auxiliary dynamics supervision does not produce an interpretable estimator and does not measurably improve reward or robustness beyond what the bottleneck alone provides; recurrent baselines remain the stronger choice for nominal performance.