Group Cognition Learning: Making Everything Better Through Governed Two-Stage Agents Collaboration

arXiv cs.LG / 5/4/2026

📰 NewsModels & Research

共有:

Key Points

The paper identifies two key issues in centralized multimodal learning: modality dominance (the model focuses on easier signals) and spurious modality coupling (overfitting to incidental cross-modal correlations).
It proposes Group Cognition Learning (GCL), a governed two-stage collaboration framework applied after modality-specific encoders.
In Stage 1 (Selective Interaction), a Routing Agent proposes interaction paths and an Auditing Agent applies sample-wise gates to boost exchanges with positive marginal predictive gain while suppressing redundant coupling.
In Stage 2 (Consensus Formation), a Public-Factor Agent maintains a shared explicit factor and an Aggregation Agent forms final predictions using contribution-aware weighting while keeping modality representations as specialized channels.
Experiments on CMU-MOSI, CMU-MOSEI, and MIntRec show that GCL reduces both dominance and coupling and achieves state-of-the-art results on regression and classification benchmarks, supported by additional analysis experiments.

Abstract

Centralized multimodal learning commonly compresses language, acoustic, and visual signals into a single fused representation for prediction. While effective, this paradigm suffers from two limitations: modality dominance, where optimization gravitates towards the path of least resistance, ignoring weaker but informative modalities, and spurious modality coupling, where models overfit to incidental cross-modal correlations. To address these, we propose Group Cognition Learning (GCL), a governed collaboration paradigm that applies a two-stage protocol after modality-specific encoding. In Stage 1 (Selective Interaction), a Routing Agent proposes directed interaction routes, and an Auditing Agent assigns sample-wise gates to emphasize exchanges that yield positive marginal predictive gain while suppressing redundant coupling. In Stage 2 (Consensus Formation), a Public-Factor Agent maintains an explicit shared factor, and an Aggregation Agent produces the final prediction through contribution-aware weighting while keeping each modality representation as a specialization channel. Extensive experiments on CMU-MOSI, CMU-MOSEI, and MIntRec demonstrate that GCL mitigates dominance and coupling, establishing state-of-the-art results across both regression and classification benchmarks. Analysis experiments further demonstrate the effectiveness of the design.

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic News

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

The Verge

CLMA Frame Test

Dev.to

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Dev.to

Roundtable chat with Talkie-1930 and Gemma 4 31B

Reddit r/LocalLLaMA

Group Cognition Learning: Making Everything Better Through Governed Two-Stage Agents Collaboration

Key Points

Abstract

Related Articles

AnnouncementsBuilding a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Dara Khosrowshahi on replacing Uber drivers — and himself — with AI

CLMA Frame Test

Governance and Liability in AI Agents: What I Built Trying to Answer Those Questions

Roundtable chat with Talkie-1930 and Gemma 4 31B

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer