TopoMesh: High-Fidelity Mesh Autoencoding via Topological Unification

arXiv cs.CV / 3/26/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • The paper identifies a core limitation of existing mesh VAE methods: ground-truth meshes have arbitrary, variable topology, while common VAEs predict fixed-structure representations (e.g., SDF on regular grids), causing a representation mismatch that weakens explicit mesh supervision.
  • TopoMesh proposes a sparse voxel-based VAE that “unifies” both ground-truth and predicted meshes using a shared Dual Marching Cubes (DMC) topological representation, enabling direct vertex/face-level correspondences.
  • It remeshes arbitrary input meshes into DMC-compliant forms via an edge-preserving algorithm that uses an L∞ distance metric, improving the preservation of sharp geometric features during reconstruction.
  • The decoder outputs meshes in the same DMC format as the targets, allowing the model to use explicit mesh-level supervision signals for topology, vertex positions, and face orientations with clearer gradients.
  • Training uses Teacher Forcing and progressive resolution for stable, efficient convergence, and experiments reportedly show TopoMesh improves reconstruction fidelity over prior VAE approaches, especially for sharp edges and fine details.

Abstract

The dominant paradigm for high-fidelity 3D generation relies on a VAE-Diffusion pipeline, where the VAE's reconstruction capability sets a firm upper bound on generation quality. A fundamental challenge limiting existing VAEs is the representation mismatch between ground-truth meshes and network predictions: GT meshes have arbitrary, variable topology, while VAEs typically predict fixed-structure implicit fields (\eg, SDF on regular grids). This inherent misalignment prevents establishing explicit mesh-level correspondences, forcing prior work to rely on indirect supervision signals such as SDF or rendering losses. Consequently, fine geometric details, particularly sharp features, are poorly preserved during reconstruction. To address this, we introduce TopoMesh, a sparse voxel-based VAE that unifies both GT and predicted meshes under a shared Dual Marching Cubes (DMC) topological framework. Specifically, we convert arbitrary input meshes into DMC-compliant representations via a remeshing algorithm that preserves sharp edges using an L\infty distance metric. Our decoder outputs meshes in the same DMC format, ensuring that both predicted and target meshes share identical topological structures. This establishes explicit correspondences at the vertex and face level, allowing us to derive explicit mesh-level supervision signals for topology, vertex positions, and face orientations with clear gradients. Our sparse VAE architecture employs this unified framework and is trained with Teacher Forcing and progressive resolution training for stable and efficient convergence. Extensive experiments demonstrate that TopoMesh significantly outperforms existing VAEs in reconstruction fidelity, achieving superior preservation of sharp features and geometric details.