AI Navigate

Edge-Efficient Two-Stream Multimodal Architecture for Non-Intrusive Bathroom Fall Detection

arXiv cs.CV / 3/19/2026

📰 NewsDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

Key Points

  • The paper proposes a two-stream architecture that separately encodes radar motion with a Motion--Mamba branch and floor vibration with an Impact--Griffin branch, using cross-conditioned fusion and a Switch--MoE head to align tokens and suppress confounders.
  • It demonstrates real-time edge inference on a Raspberry Pi 4B gateway with low latency (15.8 ms) and reduced energy per 2.56 s window (10,750 mJ) compared with a baseline.
  • A bathroom fall detection benchmark was built with over 3 hours of synchronized mmWave radar and triaxial vibration data across eight scenarios, with subject-independent train/validation/test splits achieving 96.1% accuracy, 94.8% precision, 88.0% recall, 91.1% macro F1, and AUC 0.968.
  • Compared to the strongest baseline, it improves accuracy by 2.0 percentage points and fall recall by 1.3 percentage points while reducing latency and energy costs for privacy-preserving, non-intrusive safety monitoring in wet bathroom environments.

Abstract

Falls in wet bathroom environments are a major safety risk for seniors living alone. Recent work has shown that mmWave-only, vibration-only, and existing multimodal schemes, such as vibration-triggered radar activation, early feature concatenation, and decision-level score fusion, can support privacy-preserving, non-intrusive fall detection. However, these designs still treat motion and impact as loosely coupled streams, depending on coarse temporal alignment and amplitude thresholds, and do not explicitly encode the causal link between radar-observed collapse and floor impact or address timing drift, object drop confounders, and latency and energy constraints on low-power edge devices. To this end, we propose a two-stream architecture that encodes radar signals with a Motion--Mamba branch for long-range motion patterns and processes floor vibration with an Impact--Griffin branch that emphasizes impact transients and cross-axis coupling. Cross-conditioned fusion uses low-rank bilinear interaction and a Switch--MoE head to align motion and impact tokens and suppress object-drop confounders. The model keeps inference cost suitable for real-time execution on a Raspberry Pi 4B gateway. We construct a bathroom fall detection benchmark dataset with frame-level annotations, comprising more than 3~h of synchronized mmWave radar and triaxial vibration recordings across eight scenarios under running water, together with subject-independent training, validation, and test splits. On the test split, our model attains 96.1% accuracy, 94.8% precision, 88.0% recall, a 91.1% macro F1 score, and an AUC of 0.968. Compared with the strongest baseline, it improves accuracy by 2.0 percentage points and fall recall by 1.3 percentage points, while reducing latency from 35.9 ms to 15.8 ms and lowering energy per 2.56 s window from 14200 mJ to 10750 mJ on the Raspberry Pi 4B gateway.