Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection

arXiv cs.LG / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper proposes MP-IB, a framework that treats mixed-precision quantization as an “information bottleneck” to separate stable speaker traits from volatile agitation states on resource-limited edge devices.
  • It leverages an information-asymmetry design where an FP16 trait head (1,024 bits) and an INT4 state head (128 bits) constrain which factors each head can encode, reducing the need for adversarial training.
  • MP-IB adds Dynamic Precision Scheduling and Multi-Scale Temporal Fusion to improve clinical trait-state disentanglement performance.
  • On Bridge2AI-Voice (N=833, strict speaker-independent CV), MP-IB reaches rho=0.117 (p=0.003 vs. chance) and beats several baselines by 2.8–15.9 absolute points, with strong zero-shot transfer to CREMA-D (AUC=0.817).
  • The method suppresses identity leakage to near-random levels while meeting real-time constraints (23.4 ms end-to-end latency, ~617 KB footprint) for monitoring on very low-cost devices.

Abstract

Continuous monitoring of bipolar disorder agitation via voice biomarkers requires disentangling stable speaker traits from volatile affective states on resource-constrained edge devices. We introduce MP-IB, the first framework to treat mixed-precision quantization as an information bottleneck for clinical trait-state separation. The core insight is that numerical precision itself controls capacity: an FP16 trait head (1,024 bits) encodes speaker identity, while an INT4 state head (128 bits) captures agitation, yielding 8x information asymmetry without adversarial training. We augment this with Dynamic Precision Scheduling and Multi-Scale Temporal Fusion. On Bridge2AI-Voice (N=833, 4 sessions/participant, strict speaker-independent CV), MP-IB achieves rho = 0.117 (95\% CI: [0.089, 0.145], p=0.003 vs. chance), outperforming 94M-parameter WavLM-Adapter with in-domain SSL continuation (rho = -0.042), beta VAE disentanglement (rho = 0.089), and hand-crafted prosody (rho = 0.031) by 2.8--15.9 points absolute. Zero-shot transfer to CREMA-D achieves AUC=0.817. Identity leakage is suppressed to near-random (EER=0.42, MIA-AUC=0.52). End-to-end latency is 23.4 ms with a 617 KB footprint, enabling real-time monitoring on sub 20 dollar devices.