PianoFlow: Music-Aware Streaming Piano Motion Generation with Bimanual Coordination

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

PianoFlow is an arXiv research proposal for audio-driven bimanual piano motion generation that targets accurate modeling of musical structure and dynamic coordination between hands.
The method uses MIDI as a privileged training modality to inject symbolic musical priors, while enabling audio-only inference at generation time.
PianoFlow introduces an asymmetric role-gated interaction module to explicitly model cross-hand coordination via role-aware attention and temporal gating.
To support real-time streaming for arbitrarily long sequences, it adds an autoregressive flow continuation scheme to maintain temporal coherence across chunks.
Experiments on the PianoMotion10M dataset reportedly show better qualitative and quantitative performance and over 9× faster inference than prior approaches.

Abstract

Audio-driven bimanual piano motion generation requires precise modeling of complex musical structures and dynamic cross-hand coordination. However, existing methods often rely on acoustic-only representations lacking symbolic priors, employ inflexible interaction mechanisms, and are limited to computationally expensive short-sequence generation. To address these limitations, we propose PianoFlow, a flow-matching framework for precise and coordinated bimanual piano motion synthesis. Our approach strategically leverages MIDI as a privileged modality during training, distilling these structured musical priors to achieve deep semantic understanding while maintaining audio-only inference. Furthermore, we introduce an asymmetric role-gated interaction module to explicitly capture dynamic cross-hand coordination through role-aware attention and temporal gating. To enable real-time streaming generation for arbitrarily long sequences, we design an autoregressive flow continuation scheme that ensures seamless cross-chunk temporal coherence. Extensive experiments on the PianoMotion10M dataset demonstrate that PianoFlow achieves superior quantitative and qualitative performance, while accelerating inference by over 9\times compared to previous methods.