Registration-Free Learnable Multi-View Capture of Faces in Dense Semantic Correspondence

arXiv cs.CV / 5/5/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces MOCHI, a multi-view 3D face prediction framework that learns dense semantic correspondence without needing registered training data.
MOCHI removes the dependency on slow manual registration by enforcing topological consistency using a pseudo-linear inverse kinematics solver, while semantic alignment is driven by dense keypoints from a 2D landmark predictor trained on synthetic data.
The authors find that conventional point-to-surface distance losses can cause training instabilities and visual artifacts in registration-free settings, and they propose pointmap- and normal-based losses to improve gradient smoothness and reconstruction quality.
A test-time optimization method further refines network weights for a few dozen iterations, improving accuracy and visual fidelity beyond purely feed-forward approaches.
The authors report that MOCHI can outperform traditional labor-intensive registration pipelines and provide public code and models for reproducibility.

Abstract

Recent frameworks like ToFu and TEMPEH provide an automated alternative to classical registration pipelines by predicting 3D meshes in dense semantic correspondence directly from calibrated multi-view images. However, these learning-based methods rely on the slow, manual registration pipelines they aim to replace for their training supervision. We overcome this limitation with MOCHI (Multi-view Optimizable Correspondence of Heads from Images), a multi-view 3D face prediction framework trained without requiring registered training data. MOCHI eliminates the registration data dependency by enforcing topological consistency through a pseudo-linear inverse kinematic solver. Semantic alignment is guided by dense keypoints from a 2D landmark predictor trained exclusively on synthetic data. Our analysis further reveals that standard point-to-surface distances induce training instabilities and visual artifacts in registration-free settings. We propose pointmap- and normal-based losses instead, which provide smoother gradients and superior reconstruction fidelity. Finally, we introduce a test-time optimization scheme that refines network weights over a few dozen iterations. This approach bridges the gap between feed-forward efficiency and iterative optimization precision, allowing MOCHI to outperform traditional labor-intensive pipelines in both reconstruction accuracy and visual quality. Code and model are public at: https://filby89.github.io/mochi.

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

Dev.to

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Dev.to

Meta will use AI to analyze height and bone structure to identify if users are underage

TechCrunch

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Dev.to

Building an AI Image Generator SaaS in 2026: My Tech Stack and Lessons

Dev.to

Registration-Free Learnable Multi-View Capture of Faces in Dense Semantic Correspondence

Key Points

Abstract

Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision

First experience with Building Apps with Google AI Studio: Incredibly simple and intuitive.

Meta will use AI to analyze height and bone structure to identify if users are underage

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)

Building an AI Image Generator SaaS in 2026: My Tech Stack and Lessons

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer