Spatially-Aware Evaluation Framework for Aerial LiDAR Point Cloud Semantic Segmentation: Distance-Based Metrics on Challenging Regions

arXiv cs.CV / 3/25/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper argues that standard semantic segmentation metrics (mIoU, OA) are insufficient for aerial LiDAR because they treat all misclassifications equally and can hide performance gaps in spatially complex areas.
  • It proposes distance-based evaluation metrics that measure the geometric severity of each error by comparing the misclassified point’s location to the nearest ground-truth point of the predicted class.
  • It also introduces a focused “hard-points” evaluation that scores only points misclassified by at least one evaluated model, reducing domination by easy-to-classify samples.
  • Experiments on three aerial LiDAR datasets comparing three SOTA deep learning models show the new metrics reveal spatial error patterns relevant to Earth Observation tasks that conventional metrics miss.
  • The authors claim the framework supports more informed model selection when spatial consistency is critical for downstream geospatial products like Digital Terrain Models.

Abstract

Semantic segmentation metrics for 3D point clouds, such as mean Intersection over Union (mIoU) and Overall Accuracy (OA), present two key limitations in the context of aerial LiDAR data. First, they treat all misclassifications equally regardless of their spatial context, overlooking cases where the geometric severity of errors directly impacts the quality of derived geospatial products such as Digital Terrain Models. Second, they are often dominated by the large proportion of easily classified points, which can mask meaningful differences between models and under-represent performance in challenging regions. To address these limitations, we propose a novel evaluation framework for comparing semantic segmentation models through two complementary approaches. First, we introduce distance-based metrics that account for the spatial deviation between each misclassified point and the nearest ground-truth point of the predicted class, capturing the geometric severity of errors. Second, we propose a focused evaluation on a common subset of hard points, defined as the points misclassified by at least one of the evaluated models, thereby reducing the bias introduced by easily classified points and better revealing differences in model performance in challenging regions. We validate our framework by comparing three state-of-the-art deep learning models on three aerial LiDAR datasets. Results demonstrate that the proposed metrics provide complementary information to traditional measures, revealing spatial error patterns that are critical for Earth Observation applications but invisible to conventional evaluation approaches. The proposed framework enables more informed model selection for scenarios where spatial consistency is critical.
広告