Contrastive Learning for Multimodal Human Activity Recognition with Limited Labeled Data
arXiv cs.LG / 4/28/2026
📰 NewsModels & Research
Key Points
- The paper addresses multimodal human activity recognition where modalities are heterogeneous and labeled data are scarce, proposing a gap-closing approach called CLMM.
- CLMM introduces a two-stage training pipeline that first learns cross-modal shared representations using a CNN-DiffTransformer encoder and hard-positive sample weighting to strengthen shared gradients.
- In the second stage, it learns modality-specific features via a dual-branch design with quality-guided attention and bidirectional gated units, then combines shared and modality-specific knowledge through primary–auxiliary collaborative training.
- Experiments on three public datasets show CLMM improves over existing state-of-the-art baselines in both recognition accuracy and convergence behavior.
Related Articles

Claude Haiku for Low-Cost AI Inference: Patterns from a Horse Racing Prediction System
Dev.to

How We Built an Ambient AI Clinical Documentation Pipeline (and Saved Doctors 8+ Hours a Week)
Dev.to

🦀 PicoClaw Deep Dive — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹
Dev.to

Real-Time Monitoring for AI Agents: Beyond Log Streaming
Dev.to
Top 10 Physical AI Models Powering Real-World Robots in 2026
MarkTechPost