Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

arXiv cs.AI / 4/30/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • Existing typed-composition approaches for robot skill libraries assume skills are fixed at test time, so they don’t measure how outcomes change when one skill is replaced with an updated version.
  • The paper introduces a cross-version “paired-sampling” swap protocol to study composition sensitivity to skill updates, finding a strong “dominant-skill” effect in a dual-arm peg-in-hole task.
  • Results show that whether a dominant skill is included in a composition can shift success rates by up to +50 percentage points, and that off-policy behavioral distance metrics cannot reliably identify the dominant skill.
  • To enable skill-update governance, the authors propose an atomic-quality probe and a Hybrid Selector that mix low-cost per-skill probing with selective (expensive) composition revalidation, producing a characterized cost–accuracy Pareto tradeoff.
  • In 144 skill-update decisions, the atomic-only probe is close to full revalidation on average under a mixed-oracle caveat, demonstrating a practical primitive for managing compositional robot policies as skill libraries evolve.

Abstract

Skill libraries in deployed robotic systems are continually updated through fine-tuning, fresh demonstrations, or domain adaptation, yet existing typed-composition methods (BLADE, SymSkill, Generative Skill Chaining) treat the library as frozen at test time and do not analyze how composition outcomes change when a skill is replaced. We introduce a paired-sampling cross-version swap protocol on robosuite manipulation tasks to characterize this dimension of compositional skill learning. On a dual-arm peg-in-hole task we discover a dominant-skill effect: one ECM achieves 86.7% atomic success rate while every other ECM is at or below 26.7%, and whether this dominant ECM enters a composition shifts the success rate by up to +50pp. We characterize the boundary on a simpler pick task where all atomic policies saturate at 100% and the effect is undefined. Across three tasks we further find that off-policy behavioral distance metrics fail to identify the dominant ECM, ruling out the natural cheap predictor. We propose an atomic-quality probe and a Hybrid Selector combining per-skill probes (zero per-decision cost) with selective composition revalidation (full cost), and characterize its Pareto frontier on 144 skill-update decisions. On T6 the atomic-only probe sits 23pp below full revalidation (64.6% vs 87.5% oracle match) at zero per-decision cost; a Hybrid Selector with m=10 closes most of that gap to ~12pp at 46% of full-revalidation cost. On the cross-task average over 144 events, atomic-only is within 3pp of full revalidation under a mixed-oracle caveat. The atomic-quality probe is, to our knowledge, the first principled, deployment-ready primitive for skill-update governance in compositional robot policies.