BLOSSOM: Block-wise Federated Learning Over Shared and Sparse Observed Modalities

arXiv cs.LG / 3/31/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces BLOSSOM, a task-agnostic multimodal federated learning framework built for realistic conditions where clients have different and missing modality sets.
  • BLOSSOM allows flexible sharing of model components and uses a block-wise aggregation strategy to aggregate shared blocks while keeping task-specific blocks private for partial personalization.
  • The approach is designed to handle both client heterogeneity and task heterogeneity more effectively than methods that assume uniform modality availability.
  • Experiments across multiple multimodal datasets show that block-wise personalization can substantially improve performance under severe modality sparsity.
  • Reported gains include an average 18.7% improvement over full-model aggregation in modality-incomplete settings and 37.7% in modality-exclusive scenarios, underscoring BLOSSOM’s practical value for multimodal FL deployments.

Abstract

Multimodal federated learning (FL) is essential for real-world applications such as autonomous systems and healthcare, where data is distributed across heterogeneous clients with varying and often missing modalities. However, most existing FL approaches assume uniform modality availability, limiting their applicability in practice. We introduce BLOSSOM, a task-agnostic framework for multimodal FL designed to operate under shared and sparsely observed modality conditions. BLOSSOM supports clients with arbitrary modality subsets and enables flexible sharing of model components. To address client and task heterogeneity, we propose a block-wise aggregation strategy that selectively aggregates shared components while keeping task-specific blocks private, enabling partial personalization. We evaluate BLOSSOM on multiple diverse multimodal datasets and analyse the effects of missing modalities and personalization. Our results show that block-wise personalization significantly improves performance, particularly in settings with severe modality sparsity. In modality-incomplete scenarios, BLOSSOM achieves an average performance gain of 18.7% over full-model aggregation, while in modality-exclusive settings the gain increases to 37.7%, highlighting the importance of block-wise learning for practical multimodal FL systems.