Beyond Prompts: Unconditional 3D Inversion for Out-of-Distribution Shapes

arXiv cs.CV / 4/17/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper shows that text-driven inversion for state-of-the-art native text-to-3D generative models often fails when the textual guidance is out-of-distribution, contradicting the assumption that outputs remain sensitive to prompts.
It identifies a failure mode where the model’s generation trajectories fall into latent “sink traps,” making the model insensitive to prompt changes and preventing the output geometry from responding.
The authors argue this is not due to limits in the models’ geometric expressivity, since the same models can generate diverse shapes by relying on their unconditional generative prior.
By analyzing sampling trajectories and decoupling geometric representation power from linguistic sensitivity, the study proposes a more robust framework for text-based 3D shape editing that bypasses latent sinks.
The approach aims to enable high-fidelity semantic manipulation of out-of-distribution 3D shapes and to address limitations of current 3D pipelines.

Abstract

Text-driven inversion of generative models is a core paradigm for manipulating 2D or 3D content, unlocking numerous applications such as text-based editing, style transfer, or inverse problems. However, it relies on the assumption that generative models remain sensitive to natural language prompts. We demonstrate that for state-of-the-art native text-to-3D generative models, this assumption often collapses. We identify a critical failure mode where generation trajectories are drawn into latent ``sink traps'': regions where the model becomes insensitive to prompt modifications. In these regimes, changes to the input text fail to alter internal representations in a way that alters the output geometry. Crucially, we observe that this is not a limitation of the model's \textit{geometric} expressivity; the same generative models possess the ability to produce a vast diversity of shapes but, as we demonstrate, become insensitive to out-of-distribution \textit{text} guidance. We investigate this behavior by analyzing the sampling trajectories of the generative model, and find that complex geometries can still be represented and produced by leveraging the model's unconditional generative prior. This leads to a more robust framework for text-based 3D shape editing that bypasses latent sinks by decoupling a model's geometric representation power from its linguistic sensitivity. Our approach addresses the limitations of current 3D pipelines and enables high-fidelity semantic manipulation of out-of-distribution 3D shapes. Project webpage: https://daidedou.sorpi.fr/publication/beyondprompts