Semantic Segmentation of Textured Non-manifold 3D Meshes using Transformers

arXiv cs.CV / 4/3/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper proposes a texture-aware transformer for semantic segmentation on textured non-manifold 3D meshes, addressing the difficulty of irregular mesh structure while leveraging texture information from raw face-associated pixels.
  • It introduces a hierarchical multi-scale feature aggregation scheme that combines a texture branch (pixel aggregation into a learnable token) with geometric descriptors processed through Two-Stage Transformer Blocks to balance local and global context.
  • Experiments on the Semantic Urban Meshes (SUM) benchmark show strong results (81.9% mF1, 94.3% OA), with additional evaluation on a newly curated cultural-heritage roof-tile dataset (49.7% mF1, 72.8% OA).
  • The method significantly outperforms existing approaches, indicating that jointly modeling texture plus geometry in transformer architectures can improve per-face semantic/damage-type predictions for complex meshes.

Abstract

Textured 3D meshes jointly represent geometry, topology, and appearance, yet their irregular structure poses significant challenges for deep-learning-based semantic segmentation. While a few recent methods operate directly on meshes without imposing geometric constraints, they typically overlook the rich textural information also provided by such meshes. We introduce a texture-aware transformer that learns directly from raw pixels associated with each mesh face, coupled with a new hierarchical learning scheme for multi-scale feature aggregation. A texture branch summarizes all face-level pixels into a learnable token, which is fused with geometrical descriptors and processed by a stack of Two-Stage Transformer Blocks (TSTB), which allow for both a local and a global information flow. We evaluate our model on the Semantic Urban Meshes (SUM) benchmark and a newly curated cultural-heritage dataset comprising textured roof tiles with triangle-level annotations for damage types. Our method achieves 81.9\% mF1 and 94.3\% OA on SUM and 49.7\% mF1 and 72.8\% OA on the new dataset, substantially outperforming existing approaches.