SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing

arXiv cs.RO / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper introduces SpaceMind, a modular “self-evolving” embodied vision-language agent framework aimed at autonomous on-orbit servicing requiring robust 3D perception, reasoning, and long-horizon multi-phase task execution.
SpaceMind separates the system into three independently extensible dimensions: dynamically routed skill modules, MCP tools with configurable profiles, and injectable reasoning-mode skills.
An MCP-Redis interface layer lets the same codebase run unchanged across UE5 simulation and physical laboratory/robot hardware, reducing transfer friction between environments.
The authors report extensive validation (192 closed-loop runs across five satellites, multiple task types, and degraded conditions), with 90–100% navigation success under nominal conditions and a “Prospective mode” that uniquely handles certain degraded search-and-approach tasks.
The skill self-evolution mechanism distills experience into persistent skill files without model fine-tuning, and the study shows recovery from failures in several groups plus real-world rendezvous success with 100% transfer requiring zero code modification.

Abstract

Autonomous on-orbit servicing demands embodied agents that perceive through visual sensors, reason about 3D spatial situations, and execute multi-phase tasks over extended horizons. We present SpaceMind, a modular and self-evolving vision-language model (VLM) agent framework that decomposes knowledge, tools, and reasoning into three independently extensible dimensions: skill modules with dynamic routing, Model Context Protocol (MCP) tools with configurable profiles, and injectable reasoning-mode skills. An MCP-Redis interface layer enables the same codebase to operate across simulation and physical hardware without modification, and a Skill Self-Evolution mechanism distills operational experience into persistent skill files without model fine-tuning. We validate SpaceMind through 192 closed-loop runs across five satellites, three task types, and two environments, a UE5 simulation and a physical laboratory, deliberately including degraded conditions to stress-test robustness. Under nominal conditions all modes achieve 90--100% navigation success; under degradation, the Prospective mode uniquely succeeds in search-and-approach tasks where other modes fail. A self-evolution study shows that the agent recovers from failure in four of six groups from a single failed episode, including complete failure to 100% success and inspection scores improving from 12 to 59 out of 100. Real-world validation confirms zero-code-modification transfer to a physical robot with 100% rendezvous success. Code: https://github.com/wuaodi/SpaceMind