Mantis: Mamba-native Tuning is Efficient for 3D Point Cloud Foundation Models

arXiv cs.CV / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper presents Mantis, a new Mamba-native parameter-efficient fine-tuning (PEFT) framework specifically for pre-trained 3D point cloud foundation models (PFMs).
  • It argues that existing PEFT methods for Transformer backbones do not transfer well to frozen Mamba models due to a mismatch between token-level adaptation and Mamba’s state-level sequence dynamics.
  • Mantis introduces a State-Aware Adapter (SAA) that injects lightweight, task-conditioned control signals into selected state-space updates to enable stable state-level adaptation while keeping the backbone frozen.
  • It also proposes Dual-Serialization Consistency Distillation (DSCD) to regularize different point cloud serializations and mitigate instability caused by how the point clouds are serialized.
  • Experiments on multiple benchmarks show Mantis achieves competitive results while training only about 5% of the parameters, and the authors provide open-source code.

Abstract

Pre-trained 3D point cloud foundation models (PFMs) have demonstrated strong transferability across diverse downstream tasks. However, full fine-tuning these models is computationally expensive and storage-intensive. Parameter-efficient fine-tuning (PEFT) offers a promising alternative, but existing PEFT approaches are primarily designed for Transformer-based backbones and rely on token-level prompting or feature transformation. Mamba-based backbones introduce a granularity mismatch between token-level adaptation and state-level sequence dynamics. Consequently, straightforward transfer of existing PEFT approaches to frozen Mamba backbones leads to substantial accuracy degradation and unstable optimization. To address this issue, we propose Mantis, the first Mamba-native PEFT framework for 3D PFMs. Specifically, a State-Aware Adapter (SAA) is introduced to inject lightweight task-conditioned control signals into selective state-space updates, enabling state-level adaptation while keeping the pre-trained backbone frozen. Moreover, different valid point cloud serializations are regularized by Dual-Serialization Consistency Distillation (DSCD), thereby reducing serialization-induced instability. Extensive experiments across multiple benchmarks demonstrate that our Mantis achieves competitive performance with only about 5% trainable parameters. Our code is available at https://github.com/gzhhhhhhh/Mantis.