Unlocking Multi-Site Clinical Data: A Federated Approach to Privacy-First Child Autism Behavior Analysis

arXiv cs.CV / 4/6/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper addresses the challenge of training automated child autism behavior recognition models when privacy regulations and pediatric sensitivity prevent centralized aggregation of clinical data across sites.
It proposes a federated learning framework that keeps sensitive pose-related data within each clinic while still learning generalized representations from multi-site participation.
To further protect privacy, the method includes a two-layer protection scheme that uses human skeletal abstraction to strip identifiable visual information from raw RGB video inputs before federated training.
Experiments on the MMASD benchmark show the approach achieves high recognition accuracy and outperforms traditional federated baselines.
The authors position the framework as both privacy-first and adaptable, enabling generalized learning plus site-specific personalization to handle distribution shifts across clinics.

Abstract

Automated recognition of autistic behaviors in children is essential for early intervention and objective clinical assessment. However, the development of robust models is severely hindered by strict privacy regulations (e.g., HIPAA) and the sensitive nature of pediatric data, which prevents the centralized aggregation of clinical datasets. Furthermore, individual clinical sites often suffer from data scarcity, making it difficult to learn generalized behavior patterns or tailor models to site-specific patient distributions. To address these challenges, we observe that Federated Learning (FL) can decouple model training from raw data access, enabling multi-site collaboration while maintaining strict data residency. In this paper, we present the first study exploring Federated Learning for pose-based child autism behavior recognition. Our framework employs a two-layer privacy protection mechanism: utilizing human skeletal abstraction to remove identifiable visual information from the raw RGB videos and FL to ensure sensitive pose data remains within the clinic. This approach leverages distributed clinical data to learn generalized representations while providing the flexibility for site-specific personalization. Experimental results on the MMASD benchmark demonstrate that our framework achieves high recognition accuracy, outperforming traditional federated baselines and providing a robust, privacy-first solution for multi-site clinical analysis.