OneHOI: Unifying Human-Object Interaction Generation and Editing
arXiv cs.CV / 4/16/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- OneHOI introduces a unified diffusion transformer framework that combines human-object interaction (HOI) generation and text-based HOI editing into a single conditional denoising process.
- The approach centers on a Relational Diffusion Transformer (R-DiT) that uses role-/instance-aware HOI tokens, layout-based action grounding, structured HOI attention for interaction topology, and HOI RoPE to disentangle multi-HOI scenes.
- By training jointly with modality dropout on the HOI-Edit-44K dataset plus additional HOI and object-centric data, OneHOI can handle layout-guided, layout-free, arbitrary-mask, and mixed-condition (HOI + object-only) controls.
- The paper reports state-of-the-art performance across both HOI generation and editing tasks, and provides code release via the project website.
Related Articles

Black Hat Asia
AI Business

Introducing Claude Opus 4.7
Anthropic News

AI traffic to US retailers rose 393% in Q1, and it’s boosting their revenue too
TechCrunch

The US Government Fired 40% of an Agency, Then Asked AI to Do Their Jobs
Dev.to

🚀 ROSE: Rethinking Computer Vision as a Retrieval-Augmented 🤖 System
Dev.to