Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models
arXiv cs.CV / 5/5/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- Generalized Category Discovery (GCD) is studied under domain shifts, addressing the gap in prior work that typically assumes a single domain for unlabeled data.
- The paper proposes three frameworks—HiLo, HLPrompt, and VLPrompt—that adapt foundation models from self-supervised vision backbones to vision-language models to handle both domain and semantic variation.
- HiLo disentangles domain versus semantic features using multi-level feature extraction, mutual information minimization, and training strategies like PatchMix augmentation and curriculum sampling.
- HLPrompt builds on HiLo with semantic-aware spatial prompt tuning to reduce the impact of background and domain noise during category discovery.
- VLPrompt uses vision-language models with factorized textual prompts and cross-modal consistency regularization, achieving consistent gains on both synthetic corruptions and multi-domain real-world shift settings.
Related Articles

Singapore's Fraud Frontier: Why AI Scam Detection Demands Regulatory Precision
Dev.to

How AI is Changing the Way We Code in 2026: The Shift from Syntax to Strategy
Dev.to

13 CLAUDE.md Rules That Make AI Write Modern PHP (Not PHP 5 Resurrected)
Dev.to

MCP annotations are a UX layer, not a security layer
Dev.to
From OOM to 262K Context: Running Qwen3-Coder 30B Locally on 8GB VRAM
Dev.to