PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing

arXiv cs.CV / 4/9/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces PhyEdit, a physically grounded image editing framework aimed at more precise real-world object manipulation by addressing failures in scaling and positioning caused by missing 3D geometry mechanisms.
PhyEdit improves manipulation accuracy by using a plug-and-play explicit 3D prior with geometric simulation as 3D-aware guidance, combined with joint 2D–3D supervision.
The authors release RealManip-10K, a real-world dataset containing paired images and depth annotations to support 3D-aware object manipulation research and evaluation.
They also propose ManipEval, a benchmark with multi-dimensional metrics to assess 3D spatial control and geometric consistency.
Experiments indicate PhyEdit outperforms prior approaches, including strong closed-source models, on both 3D geometric accuracy and manipulation consistency.

Abstract

Achieving physically accurate object manipulation in image editing is essential for its potential applications in interactive world models. However, existing visual generative models often fail at precise spatial manipulation, resulting in incorrect scaling and positioning of objects. This limitation primarily stems from the lack of explicit mechanisms to incorporate 3D geometry and perspective projection. To achieve accurate manipulation, we develop PhyEdit, an image editing framework that leverages explicit geometric simulation as contextual 3D-aware visual guidance. By combining this plug-and-play 3D prior with joint 2D--3D supervision, our method effectively improves physical accuracy and manipulation consistency. To support this method and evaluate performance, we present a real-world dataset, RealManip-10K, for 3D-aware object manipulation featuring paired images and depth annotations. We also propose ManipEval, a benchmark with multi-dimensional metrics to evaluate 3D spatial control and geometric consistency. Extensive experiments show that our approach outperforms existing methods, including strong closed-source models, in both 3D geometric accuracy and manipulation consistency.

Black Hat Asia

AI Business

Fully Automated Website 2026-04-11: The Scoreboard — Visual Judge Score Comparison on the Homepage

Dev.to

Human-Aligned Decision Transformers for satellite anomaly response operations with ethical auditability baked in

Dev.to

That Smoking-Gun Video? It's Not Evidence. It's a Suspect.

Dev.to

AI Citation Registries and Website-Based Publishing Constraints

Dev.to

PhyEdit: Towards Real-World Object Manipulation via Physically-Grounded Image Editing

Key Points

Abstract

Related Articles

Black Hat Asia

Fully Automated Website 2026-04-11: The Scoreboard — Visual Judge Score Comparison on the Homepage

Human-Aligned Decision Transformers for satellite anomaly response operations with ethical auditability baked in

That Smoking-Gun Video? It's Not Evidence. It's a Suspect.

AI Citation Registries and Website-Based Publishing Constraints

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Key Points

Abstract

Related Articles

Black Hat Asia

Fully Automated Website 2026-04-11: **The Scoreboard — Visual Judge Score Comparison on the Homepage**

Human-Aligned Decision Transformers for satellite anomaly response operations with ethical auditability baked in

That Smoking-Gun Video? It's Not Evidence. It's a Suspect.

AI Citation Registries and Website-Based Publishing Constraints

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer

Fully Automated Website 2026-04-11: The Scoreboard — Visual Judge Score Comparison on the Homepage