An Edge-Host-Cloud Architecture for Robot-Agnostic, Caregiver-in-the-Loop Personalized Cognitive Exercise: Multi-Site Deployment in Dementia Care

arXiv cs.RO / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureIndustry & Market MovesModels & Research

Key Points

  • The paper introduces “Speaking Memories,” a robot-agnostic, distributed platform for personalized cognitive exercise that keeps caregivers and family members in the loop.
  • It uses an edge-host-cloud socio-technical architecture that combines caregiver-authored biographical knowledge (via a secure cloud portal) with local edge intelligence to drive emotion-aware, multimodal dialogue.
  • The system decouples perception and reasoning from specific robot hardware to support low-latency, privacy-preserving operation across heterogeneous robotic embodiments.
  • It adds an automated multimodal evaluation layer that generates scalable interaction metrics from user responses and affective cues to enable assessment, model fine-tuning, and future clinician/caregiver-informed intervention planning.
  • Real-world multi-site deployments report sub-6-second response latency, robust multimodal synchronization, stable interactions, and positive usability/engagement feedback, with controlled dataset sharing under consent and IRB constraints.

Abstract

We present Speaking Memories, a distributed, stakeholder-in-the-loop robotic interaction platform for personalized cognitive exercise support. Rather than a single robot-centric system, Speaking Memories is designed as a generalizable robotics architecture that integrates caregiver-authored knowledge, local edge intelligence, and embodied robotic agents into a unified socio-technical loop. The platform fuses auditory, visual, and textual signals to enable emotion-aware, personalized dialogue, while decoupling multimodal perception and reasoning from robot-specific hardware through a local edge interaction server. This design achieves low-latency, privacy-preserving operation and supports scalable deployment across heterogeneous robotic embodiments. Caregivers and family members contribute structured biographical knowledge via a secure cloud portal, which conditions downstream dialogue policies and enables longitudinal personalization across interaction sessions. Beyond real-time interaction, the system incorporates an automated multimodal evaluation layer that continuously analyzes user responses, affective cues, and engagement patterns, producing structured interaction metrics at scale. These metrics support systematic assessment of interaction quality, enable data-driven model fine-tuning, and lay the foundation for future clinician- and caregiver-informed personalization and intervention planning. We evaluate the platform through real-world deployments, measuring end-to-end latency, dialogue coherence, interaction stability, and stakeholder-reported usability and engagement. Results demonstrate sub-6-second response latency, robust multimodal synchronization, and consistently positive feedback from both participants and caregivers. Furthermore, subsets of the dataset can be shared upon request, subject to participant consent and IRB constraints.