THOM: Generating Physically Plausible Hand-Object Meshes From Text
arXiv cs.CV / 4/6/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- THOM is a training-free framework for generating physically plausible 3D hand-object interaction (HOI) meshes directly from text, targeting dexterous robotic grasping and VR/AR content creation needs.
- It uses a two-stage pipeline that first generates hand and object Gaussians from the text, then performs physics-based HOI optimization after extracting meshes from those Gaussians.
- The approach introduces a new mesh extraction method plus a vertex-to-Gaussian mapping that assigns Gaussian elements to mesh vertices, enabling topology-aware regularization.
- To improve interaction plausibility, THOM adds VLM-guided translation refinement and contact-aware optimization during physics optimization.
- Experiments reported in the paper indicate THOM outperforms existing methods on text alignment, visual realism, and interaction plausibility.
Related Articles

How Bash Command Safety Analysis Works in AI Systems
Dev.to

How to Get Better Output from AI Tools (Without Burning Time and Tokens)
Dev.to

How I Added LangChain4j Without Letting It Take Over My Spring Boot App
Dev.to

The Future of Artificial Intelligence in Everyday Life
Dev.to

Teaching Your AI to Read: Automating Document Triage for Investigators
Dev.to