GTPBD-MM: A Global Terraced Parcel and Boundary Dataset with Multi-Modality

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsModels & Research

Key Points

  • The paper introduces GTPBD-MM, described as the first global multimodal benchmark specifically for terraced agricultural parcel extraction in mountainous, elevation-varying scenes.
  • GTPBD-MM combines high-resolution optical imagery, structured text descriptions, and DEM data, enabling evaluations under three aligned settings: image-only, image+text, and image+text+DEM.
  • The authors motivate the need for this benchmark because existing datasets and benchmarks largely target flat, regular farmland and do not capture the irregular boundaries and cross-region heterogeneity of terraced terrain.
  • They also propose ETTerra, an elevation- and text-guided multimodal baseline network intended to delineate terraced parcel boundaries by jointly leveraging semantic cues and terrain geometry.
  • Experiments indicate that both textual semantics and DEM-based elevation/geometry cues improve accuracy and produce more coherent, structurally consistent parcel delineations than visual appearance alone.

Abstract

Agricultural parcel extraction plays an important role in remote sensing-based agricultural monitoring, supporting parcel surveying, precision management, and ecological assessment. However, existing public benchmarks mainly focus on regular and relatively flat farmland scenes. In contrast, terraced parcels in mountainous regions exhibit stepped terrain, pronounced elevation variation, irregular boundaries, and strong cross-regional heterogeneity, making parcel extraction a more challenging problem that jointly requires visual recognition, semantic discrimination, and terrain-aware geometric understanding. Although recent studies have advanced visual parcel benchmarks and image-text farmland understanding, a unified benchmark for complex terraced parcel extraction under aligned image-text-DEM settings remains absent. To fill this gap, we present GTPBD-MM, the first multimodal benchmark for global terraced parcel extraction. Built upon GTPBD, GTPBD-MM integrates high-resolution optical imagery, structured text descriptions, and DEM data, and supports systematic evaluation under Image-only, Image+Text, and Image+Text+DEM settings. We further propose Elevation and Text guided Terraced parcel network (ETTerra), a multimodal baseline for terraced parcel delineation. Extensive experiments demonstrate that textual semantics and terrain geometry provide complementary cues beyond visual appearance alone, yielding more accurate, coherent, and structurally consistent delineation results in complex terraced scenes.