CoGR-MoE: Concept-Guided Expert Routing with Consistent Selection and Flexible Reasoning for Visual Question Answering

arXiv cs.CV / 4/21/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces CoGR-MoE, a Mixture-of-Experts framework for Visual Question Answering that aims to stabilize expert routing while keeping reasoning flexible.
CoGR-MoE uses semantics of the answer options during training to guide expert selection, addressing the inconsistency caused by unstable routing in similar question types.
After routing, it reweights selected experts using option features to produce discriminative, option-level representations.
The method leverages these option-level representations for option comparison and further improves them using contrastive learning, achieving strong results across multiple VQA tasks.

Abstract

Visual Question Answering (VQA) requires models to identify the correct answer options based on both visual and textual evidence. Recent Mixture-of-Experts (MoE) methods improve option reasoning by grouping similar concepts or routing based on examples. However, unstable routing can lead to inconsistent expert selection in the same question type, while overly stable routing may reduce flexibility. To address this, we propose Concept-Guided Routing framework (CoGR-MoE), which incorporates semantics of the answer options to guide expert selection in the training phase. Next, option features are used to reweight the selected experts, producing discriminative representations for each candidate option. These option-level representations are further used for option comparison and optimized via contrastive learning. The experimental results indicate that CoGR-MoE delivers strong performance across multiple VQA tasks, demonstrating the effectiveness of our approach.