Spatio-Semantic Expert Routing Architecture with Mixture-of-Experts for Referring Image Segmentation
arXiv cs.AI / 3/16/2026
💬 OpinionModels & Research
Key Points
- SERA introduces a Spatio-Semantic Expert Routing Architecture for referring image segmentation, featuring SERA-Adapter and SERA-Fusion to improve spatial coherence and boundary precision.
- It employs a lightweight, expression-aware routing mechanism and parameter-efficient tuning that updates only normalization and bias terms (less than 1% of backbone parameters) to stay compatible with pretrained encoders.
- SERA-Adapter inserts an expression-conditioned adapter into selected backbone blocks to enable expert-guided refinement and cross-modal attention, while SERA-Fusion reshapes token features into spatial grids with geometry-preserving expert transformations before multimodal interaction.
- Experiments on standard benchmarks show that SERA consistently outperforms strong baselines, with notable gains on expressions requiring precise spatial localization and boundary delineation.


