Sparse-Dense Mixture of Experts Adapter for Multi-Modal Tracking
arXiv cs.CV / 3/17/2026
📰 NewsModels & Research
Key Points
- The paper introduces Sparse-Dense Mixture of Experts Adapter (SDMoEA) for parameter-efficient fine-tuning in multi-modal tracking, addressing cross-modal heterogeneity under a unified model.
- It features an SDMoE module with a sparse MoE to capture modality-specific information and a dense-shared MoE for cross-modal information.
- A Gram-based Semantic Alignment Hypergraph Fusion (GSAHF) module is proposed to align semantics across modalities using Gram matrices and enable high-order fusion.
- Experiments on benchmarks such as LasHeR, RGBT234, VTUAV, VisEvent, COESOT, DepthTrack, and VOT-RGBD2022 show superior performance compared with other PEFT approaches.
Related Articles
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?
Dev.to
The Dawn of the Local AI Era: From iPhone 17 Pro to the Future of NVIDIA RTX
Dev.to
[P] Prompt optimization for analog circuit placement — 97% of expert quality, zero training data
Reddit r/MachineLearning
[R] Looking for arXiv endorser (cs.AI or cs.LG)
Reddit r/MachineLearning

I curated an 'Awesome List' for Generative AI in Jewelry- papers, datasets, open-source models and tools included!
Reddit r/artificial