Rethinking Image-to-3D Generation with Sparse Queries: Efficiency, Capacity, and Input-View Bias

arXiv cs.CV / 4/16/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces SparseGen, a new framework for image-to-3D generation that replaces dense 3D representations with a compact set of learned 3D anchor queries plus a learned expansion operator.
  • SparseGen transforms each anchor query into a small local set of 3D Gaussian primitives, enabling faster inference and lower memory use than volumetric grids, triplanes, or pixel-aligned primitive methods.
  • It is trained using a rectified-flow reconstruction objective without any 3D supervision, aiming to improve generalization from sparse conditioning.
  • The authors report reduced input-view bias and improved capacity utilization, arguing the sparse query mechanism helps avoid overfitting to particular conditioning views while maintaining multi-view fidelity.
  • The work proposes quantitative measures of input-view bias and representation utilization to support the claim that sparse set-latent expansion is a practical alternative for efficient 3D generative modeling.

Abstract

We present SparseGen, a novel framework for efficient image-to-3D generation, which exhibits low input-view bias while being significantly faster. Unlike traditional approaches that rely on dense volumetric grids, triplanes, or pixel-aligned primitives, we model scenes with a compact sparse set of learned 3D anchor queries and a learned expansion operator that decodes each transformed query into a small local set of 3D Gaussian primitives. Trained under a rectified-flow reconstruction objective without 3D supervision, our model learns to allocate representation capacity where geometry and appearance matter, achieving significant reductions in memory and inference time while preserving multi-view fidelity. We introduce quantitative measures of input-view bias and utilization to show that sparse queries reduce overfitting to conditioning views while being representationally efficient. Our results argue that sparse set-latent expansion is a principled, practical alternative for efficient 3D generative modeling.