Privacy-Preserving Semantic Segmentation from Ultra-Low-Resolution RGB Inputs

arXiv cs.RO / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses privacy risk in RGB-based semantic segmentation by using ultra-low-resolution RGB inputs that suppress sensitive visual information during acquisition.
  • It proposes a fully joint-learning framework designed to reduce optimization conflicts caused by severe visual degradation at ultra-low resolutions.
  • Experimental results indicate improved semantic segmentation performance over representative baselines while maintaining a favorable privacy–utility trade-off.
  • The authors validate the approach in a real-world robotic object-goal navigation task, showing effective downstream task execution under highly degraded visual inputs.

Abstract

RGB-based semantic segmentation has become a mainstream approach for visual perception and is widely applied in a variety of downstream tasks. However, existing methods typically rely on high-resolution RGB inputs, which may expose sensitive visual content in privacy-critical environments. Ultra-low-resolution RGB sensing suppresses sensitive information directly during image acquisition, making it an attractive privacy-preserving alternative. Nevertheless, recovering semantic segmentation from ultra-low-resolution RGB inputs remains highly challenging due to severe visual degradation. In this work, we introduce a novel fully joint-learning framework to mitigate the optimization conflicts exacerbated by visual degradation for ultra-low-resolution semantic segmentation. Experiments demonstrate that our method outperforms representative baselines in semantic segmentation performance and our ultra-low-resolution RGB input achieves a favorable trade-off between privacy preservation and semantic segmentation performance. We deploy our privacy-preserving semantic segmentation method in a real-world robotic object-goal navigation task, demonstrating successful downstream task execution even under severe visual degradation.