ReConText3D: Replay-based Continual Text-to-3D Generation

arXiv cs.CV / 4/16/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

ReConText3D is proposed as the first continual text-to-3D generation framework, aiming to learn new 3D categories from text incrementally while avoiding catastrophic forgetting.
The authors show that existing text-to-3D models degrade under incremental training, motivating a replay-based approach that preserves performance on previously learned categories.
ReConText3D builds a compact, diverse replay memory using text-embedding k-Center selection, enabling rehearsal of prior knowledge without changing the underlying generative model architecture.
The paper introduces Toys4K-CL, a class-incremental benchmark derived from Toys4K with balanced and semantically diverse splits to evaluate continual text-to-3D learning systematically.
Experiments on Toys4K-CL indicate ReConText3D outperforms baselines across multiple generative backbones, maintaining high-quality generation for both old and newly learned classes.

Abstract

Continual learning enables models to acquire new knowledge over time while retaining previously learned capabilities. However, its application to text-to-3D generation remains unexplored. We present ReConText3D, the first framework for continual text-to-3D generation. We first demonstrate that existing text-to-3D models suffer from catastrophic forgetting under incremental training. ReConText3D enables generative models to incrementally learn new 3D categories from textual descriptions while preserving the ability to synthesize previously seen assets. Our method constructs a compact and diverse replay memory through text-embedding k-Center selection, allowing representative rehearsal of prior knowledge without modifying the underlying architecture. To systematically evaluate continual text-to-3D learning, we introduce Toys4K-CL, a benchmark derived from the Toys4K dataset that provides balanced and semantically diverse class-incremental splits. Extensive experiments on the Toys4K-CL benchmark show that ReConText3D consistently outperforms all baselines across different generative backbones, maintaining high-quality generation for both old and new classes. To the best of our knowledge, this work establishes the first continual learning framework and benchmark for text-to-3D generation, opening a new direction for incremental 3D generative modeling. Project page is available at: https://mauk95.github.io/ReConText3D/.