Representing 3D Faces with Learnable B-Spline Volumes

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces CUBE (Control-based Unified B-spline Encoding), a new learnable geometric representation for 3D human faces that combines B-spline volumes with high-dimensional learned control features.
CUBE replaces 3D control points with a lattice of feature vectors (e.g., 8×8×8), using B-spline local blending to produce an intermediate feature vector and then an MLP to predict residual displacements for refined 3D coordinates.
The method supports dense semantic correspondence by querying CUBE at template-sampled 3D coordinates to reconstruct surfaces in a consistent parameterization.
A key advantage is that CUBE keeps the local support property of traditional B-splines, enabling local surface editing by modifying individual control features.
Experiments train transformer-based encoders to predict CUBE control features from point clouds and monocular images, showing state-of-the-art performance on scan registration compared with recent baselines.

Abstract

We present CUBE (Control-based Unified B-spline Encoding), a new geometric representation for human faces that combines B-spline volumes with learned features, and demonstrate its use as a decoder for 3D scan registration and monocular 3D face reconstruction. Unlike existing B-spline representations with 3D control points, CUBE is parametrized by a lattice (e.g., 8 x 8 x 8) of high-dimensional control features, increasing the model's expressivity. These features define a continuous, two-stage mapping from a 3D parametric domain to 3D Euclidean space via an intermediate feature space. First, high-dimensional control features are locally blended using the B-spline bases, yielding a high-dimensional feature vector whose first three values define a 3D base mesh. A small MLP then processes this feature vector to predict a residual displacement from the base shape, yielding the final refined 3D coordinates. To reconstruct 3D surfaces in dense semantic correspondence, CUBE is queried at 3D coordinates sampled from a fixed template mesh. Crucially, CUBE retains the local support property of traditional B-spline representations, enabling local surface editing by updating individual control features. We demonstrate the strengths of this representation by training transformer-based encoders to predict CUBE's control features from unstructured point clouds and monocular images, achieving state-of-the-art scan registration results compared to recent baselines.