AI Navigate

Communication-Efficient and Robust Multi-Modal Federated Learning via Latent-Space Consensus

arXiv cs.LG / 3/20/2026

📰 NewsModels & Research

Key Points

  • Introduces CoMFed, a communication-efficient multi-modal federated learning framework that uses learnable projection matrices to create compressed latent representations.
  • A latent-space regularizer aligns representations across clients to improve cross-modal consistency and robustness to outliers.
  • The approach addresses heterogeneity in modalities and model architectures while preserving privacy and reducing communication overhead.
  • Experimental results on human activity recognition benchmarks show competitive accuracy with minimal overhead.

Abstract

Federated learning (FL) enables collaborative model training across distributed devices without sharing raw data, but applying FL to multi-modal settings introduces significant challenges. Clients typically possess heterogeneous modalities and model architectures, making it difficult to align feature spaces efficiently while preserving privacy and minimizing communication costs. To address this, we introduce CoMFed, a Communication-Efficient Multi-Modal Federated Learning framework that uses learnable projection matrices to generate compressed latent representations. A latent-space regularizer aligns these representations across clients, improving cross-modal consistency and robustness to outliers. Experiments on human activity recognition benchmarks show that CoMFed achieves competitive accuracy with minimal overhead.