Complementarity-Supervised Spectral-Band Routing for Multimodal Emotion Recognition
arXiv cs.CV / 3/17/2026
💬 OpinionModels & Research
Key Points
- The paper argues that prior multimodal emotion recognition methods rely on independent unimodal performance and use coarse-grained fusion, hindering cross-modal synergy.
- It proposes Atsuko, the Complementarity-Supervised Multi-Band Expert Network, which decomposes each modality into high-, mid-, and low-frequency components for fine-grained feature modeling.
- Atsuko introduces a modality-level router with a dual-path mechanism to enable fine-grained cross-band selection and cross-modal fusion.
- The Marginal Complementarity Module quantifies the performance loss from removing each modality via bi-modal comparison and provides soft supervision to guide the router toward unique information gains.
- Experiments on CMU-MOSI, CMU-MOSEI, CH-SIMS, CH-SIMSv2, and MIntRec demonstrate superior performance, validating the effectiveness of the approach.
Related Articles

PearlOS. We gave swarm intelligence a local desktop environment and code control to self-evolve. Has been pretty incredible to see so far. Open source and free if you want your own.
Reddit r/LocalLLaMA
QwenDean-4B | fine-tuned SLM for UIGen; our first attempt, looking for feedback!
Reddit r/LocalLLaMA
acestep.cpp: portable C++17 implementation of ACE-Step 1.5 music generation using GGML. Runs on CPU, CUDA, ROCm, Metal, Vulkan
Reddit r/LocalLLaMA
**Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding**
Hugging Face Blog

Newest GPU server in the lab! 72gb ampere vram!
Reddit r/LocalLLaMA