VISION-SLS: Safe Perception-Based Control from Learned Visual Representations via System Level Synthesis

arXiv cs.LG / 4/29/2026

💬 OpinionDeveloper Stack & InfrastructureModels & Research

Key Points

  • VISION-SLS is a control method that uses high-resolution RGB images to compute nonlinear output-feedback control with robust constraint-satisfaction guarantees under calibrated uncertainty.
  • The approach combines a learned low-dimensional observation map built from pretrained visual features (with state-dependent error bounds) and a causal affine time-varying output-feedback policy optimized via System Level Synthesis (SLS).
  • The authors introduce a scalable solver for a resulting nonconvex optimization problem by using sequential convex programming and efficient Riccati recursions.
  • Experiments on simulated 4D car, 10D quadrotor, and a 59D humanoid with partial observability show safe, information-gathering behavior and constraint satisfaction using empirically calibrated error bounds.
  • Hardware validation demonstrates safe ground-vehicle control from onboard images, with improved safety rate and solve time versus baselines, and the code is published on GitHub.

Abstract

We propose VISION-SLS, a method for nonlinear output-feedback control from high-resolution RGB images which provides robust constraint satisfaction guarantees under calibrated uncertainty bounds despite partial observability, sensor noise, and nonlinear dynamics. To enable scalability while retaining guarantees, we propose: (i) a learned low-dimensional observation map from pretrained visual features with state-dependent error bounds, and (ii) a causal affine time-varying output-feedback policy optimized via System Level Synthesis (SLS). We develop a scalable, novel solver for the resulting nonconvex program that leverages sequential convex programming coupled with efficient Riccati recursions. On two simulated visuomotor tasks (a 4D car and a 10D quadrotor) with >= 512 x 512 pixels and a 59D humanoid task with partial observability, our method enables safe, information-gathering behavior that reduces uncertainty while guaranteeing constraint satisfaction with empirically-calibrated error bounds. We also validate our method on hardware, safely controlling a ground vehicle from onboard images, outperforming baselines in safety rate and solve times. Together, these results show that learned visual abstractions coupled with an efficient solver make SLS-based safe visuomotor output-feedback practical at scale. The code implementation of our method is available at https://github.com/trustworthyrobotics/VISION-SLS.