GenFusion: Feed-forward Human Performance Capture via Progressive Canonical Space Updates
arXiv cs.CV / 4/1/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper introduces GenFusion, a feed-forward human performance capture method that synthesizes novel views from a single monocular RGB video stream.
- It addresses missing observations in unseen body regions by maintaining a canonical space that is progressively updated frame-by-frame as the subject moves continuously.
- The canonical space acts as a time-accumulated “context bank” to provide appearance information when the current frame lacks direct visibility.
- Rendering is cast as probabilistic regression to better reconcile past (canonical/context) and current (live deformation) observations, yielding sharper results than deterministic regression.
- Experiments on 4D-Dress (in-domain) and MVHumanNet (out-of-distribution) show improved reconstruction quality and plausible synthesis even where no prior observations exist.
Related Articles

Black Hat Asia
AI Business

90000 Tech Workers Got Fired This Year and Everyone Is Blaming AI but Thats Not the Whole Story
Dev.to

Microsoft’s $10 Billion Japan Bet Shows the Next AI Battleground Is National Infrastructure
Dev.to

TII Releases Falcon Perception: A 0.6B-Parameter Early-Fusion Transformer for Open-Vocabulary Grounding and Segmentation from Natural Language Prompts
MarkTechPost

Portable eye scanner powered by AI expands access to low-cost community screening
Reddit r/artificial