Representing 3D Faces with Learnable B-Spline Volumes

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces CUBE (Control-based Unified B-spline Encoding), a new learnable geometric representation for 3D human faces that combines B-spline volumes with high-dimensional learned control features.
  • CUBE replaces 3D control points with a lattice of feature vectors (e.g., 8×8×8), using B-spline local blending to produce an intermediate feature vector and then an MLP to predict residual displacements for refined 3D coordinates.
  • The method supports dense semantic correspondence by querying CUBE at template-sampled 3D coordinates to reconstruct surfaces in a consistent parameterization.
  • A key advantage is that CUBE keeps the local support property of traditional B-splines, enabling local surface editing by modifying individual control features.
  • Experiments train transformer-based encoders to predict CUBE control features from point clouds and monocular images, showing state-of-the-art performance on scan registration compared with recent baselines.

Abstract

We present CUBE (Control-based Unified B-spline Encoding), a new geometric representation for human faces that combines B-spline volumes with learned features, and demonstrate its use as a decoder for 3D scan registration and monocular 3D face reconstruction. Unlike existing B-spline representations with 3D control points, CUBE is parametrized by a lattice (e.g., 8 x 8 x 8) of high-dimensional control features, increasing the model's expressivity. These features define a continuous, two-stage mapping from a 3D parametric domain to 3D Euclidean space via an intermediate feature space. First, high-dimensional control features are locally blended using the B-spline bases, yielding a high-dimensional feature vector whose first three values define a 3D base mesh. A small MLP then processes this feature vector to predict a residual displacement from the base shape, yielding the final refined 3D coordinates. To reconstruct 3D surfaces in dense semantic correspondence, CUBE is queried at 3D coordinates sampled from a fixed template mesh. Crucially, CUBE retains the local support property of traditional B-spline representations, enabling local surface editing by updating individual control features. We demonstrate the strengths of this representation by training transformer-based encoders to predict CUBE's control features from unstructured point clouds and monocular images, achieving state-of-the-art scan registration results compared to recent baselines.