PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning
arXiv cs.CV / 5/4/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper argues that instruction-following image editors need adaptive inference that varies both spatial coverage and reasoning depth, because different edits (e.g., color swap vs. physical-action changes) require different computation budgets.
- It introduces PhysEdit, a region-aware image editing framework that adds two inference-time modules—Complexity-Adaptive Reasoning Depth (CARD) and a Spatial Reasoning Mask (SRM)—without retraining the backbone.
- CARD predicts per-sample edit complexity from the instruction and reference image and conditionally allocates the number of reasoning steps and reasoning token length, turning a fixed inference schedule into conditional computation.
- SRM uses instruction-conditioned cross-attention to produce a spatial prior that restricts reasoning to semantically relevant regions, improving where the model spends its effort.
- On the ImgEdit Basic-Edit Suite (737 cases), PhysEdit achieves a 1.18× wall-clock speedup while slightly improving instruction adherence and maintaining identity preservation, with larger gains (up to 1.52×) for appearance-level edits.
Related Articles
A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"
Reddit r/LocalLLaMA

ALM on Power Platform: ADO + GitHub, the best of both worlds
Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?
Dev.to

Find 12 high-volume, low-competition GEO content topics Topify.ai should rank on
Dev.to

When a memorized rule fits your bug too well: a meta-trap of agent workflows
Dev.to