AI Navigate

DesertFormer: Transformer-Based Semantic Segmentation for Off-Road Desert Terrain Classification in Autonomous Navigation Systems

arXiv cs.CV / 3/19/2026

📰 NewsModels & Research

Key Points

  • DesertFormer uses a SegFormer B2 backbone to perform semantic segmentation of desert terrain, enabling safety‑aware path planning for autonomous navigation in off‑road environments.
  • It classifies terrain into ten ecologically meaningful categories (Trees, Lush Bushes, Dry Grass, Dry Bushes, Ground Clutter, Flowers, Logs, Rocks, Landscape, Sky) and is trained on a 4,176-image, 512x512 dataset.
  • The model achieves a mean IoU of 64.4% and pixel accuracy of 86.1%, representing a 24.2‑point absolute improvement over a DeepLabV3 MobileNetV2 baseline.
  • The authors provide a failure analysis identifying key confusion patterns and propose mitigations (class‑weighted training and copy‑paste augmentation) along with code, checkpoints, and an interactive inference dashboard on GitHub.

Abstract

Reliable terrain perception is a fundamental requirement for autonomous navigation in unstructured, off-road environments. Desert landscapes present unique challenges due to low chromatic contrast between terrain categories, extreme lighting variability, and sparse vegetation that defy the assumptions of standard road-scene segmentation models. We present DesertFormer, a semantic segmentation pipeline for off-road desert terrain analysis based on SegFormer B2 with a hierarchical Mix Transformer (MiT-B2) backbone. The system classifies terrain into ten ecologically meaningful categories -- Trees, Lush Bushes, Dry Grass, Dry Bushes, Ground Clutter, Flowers, Logs, Rocks, Landscape, and Sky -- enabling safety-aware path planning for ground robots and autonomous vehicles. Trained on a purpose-built dataset of 4,176 annotated off-road images at 512x512 resolution, DesertFormer achieves a mean Intersection-over-Union (mIoU) of 64.4% and pixel accuracy of 86.1%, representing a +24.2% absolute improvement over a DeepLabV3 MobileNetV2 baseline (41.0% mIoU). We further contribute a systematic failure analysis identifying the primary confusion patterns -- Ground Clutter to Landscape and Dry Grass to Landscape -- and propose class-weighted training and copy-paste augmentation for rare terrain categories. Code, checkpoints, and an interactive inference dashboard are released at https://github.com/Yasaswini-ch/Vision-based-Desert-Terrain-Segmentation-using-SegFormer.