SocialMirror: Reconstructing 3D Human Interaction Behaviors from Monocular Videos with Semantic and Geometric Guidance
arXiv cs.CV / 4/16/2026
💬 OpinionSignals & Early TrendsModels & Research
Key Points
- SocialMirror is a diffusion-based framework for reconstructing 3D human interaction behaviors from monocular videos, targeting hard close-contact scenarios with heavy mutual occlusions.
- It combines semantic guidance from vision-language-generated interaction descriptions with a semantic-guided motion infiller to hallucinate occluded bodies and resolve local pose ambiguities.
- It improves temporal consistency using a sequence-level temporal refiner that produces smooth, jitter-free motion across frames.
- During sampling, SocialMirror enforces geometric constraints to maintain plausible contact and correct spatial relationships between interacting people.
- Experiments on multiple interaction benchmarks report state-of-the-art 3D interactive mesh reconstruction performance with strong generalization to unseen datasets and in-the-wild videos, with code planned for release upon publication.
Related Articles

Black Hat Asia
AI Business

oh-my-agent is Now Official on Homebrew-core: A New Milestone for Multi-Agent Orchestration
Dev.to

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
Training Qwen2.5-0.5B-Instruct on Reddit posts summarization tasks with length constraint on my 3xMac Minis with GRPO - evals update
Reddit r/LocalLLaMA
GGUF Quants Arena for MMLU (24GB VRAM + 128GB RAM)
Reddit r/LocalLLaMA