Mining Attribute Subspaces for Efficient Fine-tuning of 3D Foundation Models

arXiv cs.CV / 4/14/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper investigates whether LoRA adapters for 3D foundation models can be decomposed into separate subspaces tied to specific variation factors such as texture, geometry, camera motion, and lighting.
  • It proposes a method that creates synthetic 3D datasets with controlled, single-type variations, fine-tunes a LoRA adapter per synthetic dataset, and then derives the corresponding LoRA subspaces.
  • The authors find that the extracted subspaces are approximately disentangled, suggesting near-orthogonality between different sources of variation.
  • By integrating the subspaces, the method produces a reduced LoRA subspace that enables more efficient fine-tuning while improving downstream prediction accuracy.
  • Although the reduced subspace is derived entirely from synthetic data, the study reports that it generalizes to real-world 3D datasets, supported by an ablation study.

Abstract

With the emergence of 3D foundation models, there is growing interest in fine-tuning them for downstream tasks, where LoRA is the dominant fine-tuning paradigm. As 3D datasets exhibit distinct variations in texture, geometry, camera motion, and lighting, there are interesting fundamental questions: 1) Are there LoRA subspaces associated with each type of variation? 2) Are these subspaces disentangled (i.e., orthogonal to each other)? 3) How do we compute them effectively? This paper provides answers to all these questions. We introduce a robust approach that generates synthetic datasets with controlled variations, fine-tunes a LoRA adapter on each dataset, and extracts a LoRA sub-space associated with each type of variation. We show that these subspaces are approximately disentangled. Integrating them leads to a reduced LoRA subspace that enables efficient LoRA fine-tuning with improved prediction accuracy for downstream tasks. In particular, we show that such a reduced LoRA subspace, despite being derived entirely from synthetic data, generalizes to real datasets. An ablation study validates the effectiveness of the choices in our approach.