AI Navigate

Points-to-3D: Structure-Aware 3D Generation with Point Cloud Priors

arXiv cs.CV / 3/20/2026

📰 NewsModels & Research

Key Points

  • Points-to-3D presents a diffusion-based framework that uses point cloud priors to enable geometry-controllable 3D asset and scene generation, built on the TRELLIS latent 3D diffusion model.
  • The method replaces pure-noise latent initialization with a point-cloud-priors tailored input formulation and includes a structure inpainting network trained within TRELLIS for global structural inpainting.
  • It employs a staged sampling strategy (structural inpainting followed by boundary refinement) to complete global geometry while preserving the visible regions from input priors.
  • The approach accepts accurate point-cloud priors or VGGT-estimated point clouds from single images and demonstrates superior rendering quality and geometric fidelity compared with state-of-the-art baselines.

Abstract

Recent progress in 3D generation has been driven largely by models conditioned on images or text, while readily available 3D priors are still underused. In many real-world scenarios, the visible-region point cloud are easy to obtain from active sensors such as LiDAR or from feed-forward predictors like VGGT, offering explicit geometric constraints that current methods fail to exploit. In this work, we introduce Points-to-3D, a diffusion-based framework that leverages point cloud priors for geometry-controllable 3D asset and scene generation. Built on a latent 3D diffusion model TRELLIS, Points-to-3D first replaces pure-noise sparse structure latent initialization with a point cloud priors tailored input formulation.A structure inpainting network, trained within the TRELLIS framework on task-specific data designed to learn global structural inpainting, is then used for inference with a staged sampling strategy (structural inpainting followed by boundary refinement), completing the global geometry while preserving the visible regions of the input priors.In practice, Points-to-3D can take either accurate point-cloud priors or VGGT-estimated point clouds from single images as input. Experiments on both objects and scene scenarios consistently demonstrate superior performance over state-of-the-art baselines in terms of rendering quality and geometric fidelity, highlighting the effectiveness of explicitly embedding point-cloud priors for achieving more accurate and structurally controllable 3D generation.