Harnessing the Power of Foundation Models for Accurate Material Classification
arXiv cs.CV / 3/19/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- The paper proposes a framework that harnesses vision-language foundation models to address data scarcity in material classification.
- It introduces a robust image generation and auto-labeling pipeline that creates diverse, high-quality material-centric training data by fusing object semantics and material attributes in prompts.
- It adds a prior incorporation strategy to distill information from VLMs and a joint fine-tuning method that optimizes a pre-trained vision model together with VLM-derived priors to preserve generalizability while adapting to material-specific features.
- Experiments on multiple datasets show significant improvements, with synthetic data effectively capturing real-world material characteristics and priors boosting final performance, and the authors announce the release of source code and dataset.




