Stress Estimation in Elderly Oncology Patients Using Visual Wearable Representations and Multi-Instance Learning

arXiv cs.LG / 4/9/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The study proposes estimating perceived psychological stress in elderly breast cancer patients by leveraging continuous multimodal wearable data (smartwatch activity/sleep and chest ECG) rather than relying solely on intermittent PROM questionnaires.
  • It converts heterogeneous wearable time-series into visual representations and trains a weakly supervised, attention-based multiple instance learning (MIL) framework where one PSS score labels many unlabeled signal windows.
  • A lightweight pretrained mixture-of-experts backbone (Tiny-BioMoE) produces 192-dimensional embeddings for each wearable representation, which are aggregated to predict PSS at 3 and 6 months.
  • In leave-one-subject-out (LOSO) evaluation on the multicenter CARDIOCARE cohort, the model achieves moderate concordance with questionnaire-based stress scores (e.g., M3 Pearson r=0.42, M6 Pearson r=0.49) with RMSE/MAE around 6 for both time points.
  • The approach targets integration of stress monitoring into cardiotoxicity surveillance for cardio-oncology by enabling more continuous assessment tied to wearable sensing streams.

Abstract

Psychological stress is clinically relevant in cardio-oncology, yet it is typically assessed only through patient-reported outcome measures (PROMs) and is rarely integrated into continuous cardiotoxicity surveillance. We estimate perceived stress in an elderly, multicenter breast cancer cohort (CARDIOCARE) using multimodal wearable data from a smartwatch (physical activity and sleep) and a chest-worn ECG sensor. Wearable streams are transformed into heterogeneous visual representations, yielding a weakly supervised setting in which a single Perceived Stress Scale (PSS) score corresponds to many unlabeled windows. A lightweight pretrained mixture-of-experts backbone (Tiny-BioMoE) embeds each representation into 192-dimensional vectors, which are aggregated via attention-based multiple instance learning (MIL) to predict PSS at month 3 (M3) and month 6 (M6). Under leave-one-subject-out (LOSO) evaluation, predictions showed moderate agreement with questionnaire scores (M3: R^2=0.24, Pearson r=0.42, Spearman rho=0.48; M6: R^2=0.28, Pearson r=0.49, Spearman rho=0.52), with global RMSE/MAE of 6.62/6.07 at M3 and 6.13/5.54 at M6.