P3T: Prototypical Point-level Prompt Tuning with Enhanced Generalization for 3D Vision-Language Models

arXiv cs.CV / 4/20/2026

📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research

共有:

Key Points

The paper introduces Prototypical Point-level Prompt Tuning (P3T), a parameter-efficient way to adapt pre-trained 3D vision-language models to downstream point-cloud tasks without the cost of full fine-tuning.
P3T uses two learned components—a Point Prompter that creates instance-aware point-level prompts from the input cloud and a Text Prompter that injects learnable prompts into the text—reducing reliance on handcrafted prompts.
To improve generalization, the method adds a prototypical loss aimed at aligning embedding spaces by lowering intra-category variance.
Experiments on classification and few-shot learning show performance on par with or better than full fine-tuning, including strong robustness to data shifts across datasets.
The authors release code on GitHub, enabling others to reproduce and build on the approach for 3D VLM adaptation.

Abstract

With the rise of pre-trained models in the 3D point cloud domain for a wide range of real-world applications, adapting them to downstream tasks has become increasingly important. However, conventional full fine-tuning methods are computationally expensive and storage-intensive. Although prompt tuning has emerged as an efficient alternative, it often suffers from overfitting, thereby compromising generalization capability. To address this issue, we propose Prototypical Point-level Prompt Tuning (P

^3

T), a parameter-efficient prompt tuning method designed for pre-trained 3D vision-language models (VLMs). P

^3

T consists of two components: 1) \textit{Point Prompter}, which generates instance-aware point-level prompts for the input point cloud, and 2) \textit{Text Prompter}, which employs learnable prompts into the input text instead of hand-crafted ones. Since both prompters operate directly on input data, P

^3

T enables task-specific adaptation of 3D VLMs without sacrificing generalizability. Furthermore, to enhance embedding space alignment, which is key to fine-tuning 3D VLMs, we introduce a prototypical loss that reduces intra-category variance. Extensive experiments demonstrate that our method matches or outperforms full fine-tuning in classification and few-shot learning, and further exhibits robust generalization under data shift in the cross-dataset setting. The code is available at \textcolor{violet}{https://github.com/gyjung975/P3T}.

Black Hat USA

AI Business

Black Hat Asia

AI Business

Which Version of Qwen 3.6 for M5 Pro 24g

Reddit r/LocalLLaMA

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

Dev.to

P3T: Prototypical Point-level Prompt Tuning with Enhanced Generalization for 3D Vision-Language Models

Key Points

Abstract

Related Articles

Black Hat USA

Black Hat Asia

Which Version of Qwen 3.6 for M5 Pro 24g

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer