OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism

arXiv cs.CV / 4/17/2026

📰 NewsDeveloper Stack & InfrastructureModels & Research

共有:

Key Points

The paper introduces OmniGCD, a modality-agnostic approach to Generalized Category Discovery (GCD) that aims to discover both known and novel classes from partially labeled data.
OmniGCD uses separate modality-specific encoders (e.g., vision, audio, text, remote sensing), then applies dimension reduction to build a shared GCD latent space that is adapted at test time for clustering.
A key contribution is a synthetically trained Transformer-based model and a new zero-shot GCD evaluation setting that forbids dataset-specific fine-tuning.
Evaluated across 16 datasets across four modalities, OmniGCD reports improved accuracy for both known and novel classes versus baselines, with an average gains of +6.2, +17.9, +1.5, and +12.7 percentage points for vision, text, audio, and remote sensing respectively.
The authors position OmniGCD as a benchmark for future modality-agnostic GCD research and argue that decoupling representation learning from category discovery can accelerate encoder development across modalities.

Abstract

Generalized Category Discovery (GCD) challenges methods to identify known and novel classes using partially labeled data, mirroring human category learning. Unlike prior GCD methods, which operate within a single modality and require dataset-specific fine-tuning, we propose a modality-agnostic GCD approach inspired by the human brain's abstract category formation. Our

\textbf{OmniGCD}

leverages modality-specific encoders (e.g., vision, audio, text, remote sensing) to process inputs, followed by dimension reduction to construct a

\textbf{GCD latent space}

, which is transformed at test-time into a representation better suited for clustering using a novel synthetically trained Transformer-based model. To evaluate OmniGCD, we introduce a

\textbf{zero-shot GCD setting}

where no dataset-specific fine-tuning is allowed, enabling modality-agnostic category discovery.

\textbf{Trained once on synthetic data}

, OmniGCD performs zero-shot GCD across 16 datasets spanning four modalities, improving classification accuracy for known and novel classes over baselines (average percentage point improvement of

\textbf{+6.2}

\textbf{+17.9}

\textbf{+1.5}

and

\textbf{+12.7}

for vision, text, audio and remote sensing). This highlights the importance of strong encoders while decoupling representation learning from category discovery. Improving modality-agnostic methods will propagate across modalities, enabling encoder development independent of GCD. Our work serves as a benchmark for future modality-agnostic GCD works, paving the way for scalable, human-inspired category discovery. All code is available

\href{https://github.com/Jordan-HS/OmniGCD}{here}

FastAPI With LangChain and MongoDB

Dev.to

[Patterns] AI Agent Error Handling That Actually Works

Dev.to

Building ONNX Embedding Workflows in Oracle AI Database with Python

Dev.to

🌱 Green Habit Tracker

Dev.to

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

Dev.to

OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism

Key Points

Abstract

Related Articles

FastAPI With LangChain and MongoDB

[Patterns] AI Agent Error Handling That Actually Works

Building ONNX Embedding Workflows in Oracle AI Database with Python

🌱 Green Habit Tracker

[2026] OpenTelemetry for LLM Observability — Self-Hosted Setup

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer