SIC3D: Style Image Conditioned Text-to-3D Gaussian Splatting Generation

arXiv cs.CV / 4/13/2026

💬 OpinionSignals & Early TrendsModels & Research

共有:

Key Points

SIC3D is a two-stage, image-conditioned text-to-3D generation pipeline that combines 2D diffusion-style guidance with 3D Gaussian Splatting to produce controllable 3D objects from text and a reference image.
The first stage generates 3D content from text using a text-to-3DGS model, aiming to improve geometry synthesis derived from natural-language input.
The second stylization stage transfers style from a reference image to the 3DGS representation using a novel Variational Stylized Score Distillation (VSSD) loss that targets both global and local texture patterns.
SIC3D includes scaling regularization to reduce artifacts and better preserve the intended style patterns during the geometry-appearance alignment process.
The authors report that SIC3D improves geometric fidelity and style adherence, achieving stronger qualitative and quantitative performance than prior text-to-3D approaches.

Abstract

Recent progress in text-to-3D object generation enables the synthesis of detailed geometry from text input by leveraging 2D diffusion models and differentiable 3D representations. However, the approaches often suffer from limited controllability and texture ambiguity due to the limitation of the text modality. To address this, we present SIC3D, a controllable image-conditioned text-to-3D generation pipeline with 3D Gaussian Splatting (3DGS). There are two stages in SIC3D. The first stage generates the 3D object content from text with a text-to-3DGS generation model. The second stage transfers style from a reference image to the 3DGS. Within this stylization stage, we introduce a novel Variational Stylized Score Distillation (VSSD) loss to effectively capture both global and local texture patterns while mitigating conflicts between geometry and appearance. A scaling regularization is further applied to prevent the emergence of artifacts and preserve the pattern from the style image. Extensive experiments demonstrate that SIC3D enhances geometric fidelity and style adherence, outperforming prior approaches in both qualitative and quantitative evaluations.