PianoFlow: Music-Aware Streaming Piano Motion Generation with Bimanual Coordination

arXiv cs.CV / 4/15/2026

📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • PianoFlow is an arXiv research proposal for audio-driven bimanual piano motion generation that targets accurate modeling of musical structure and dynamic coordination between hands.
  • The method uses MIDI as a privileged training modality to inject symbolic musical priors, while enabling audio-only inference at generation time.
  • PianoFlow introduces an asymmetric role-gated interaction module to explicitly model cross-hand coordination via role-aware attention and temporal gating.
  • To support real-time streaming for arbitrarily long sequences, it adds an autoregressive flow continuation scheme to maintain temporal coherence across chunks.
  • Experiments on the PianoMotion10M dataset reportedly show better qualitative and quantitative performance and over 9× faster inference than prior approaches.

Abstract

Audio-driven bimanual piano motion generation requires precise modeling of complex musical structures and dynamic cross-hand coordination. However, existing methods often rely on acoustic-only representations lacking symbolic priors, employ inflexible interaction mechanisms, and are limited to computationally expensive short-sequence generation. To address these limitations, we propose PianoFlow, a flow-matching framework for precise and coordinated bimanual piano motion synthesis. Our approach strategically leverages MIDI as a privileged modality during training, distilling these structured musical priors to achieve deep semantic understanding while maintaining audio-only inference. Furthermore, we introduce an asymmetric role-gated interaction module to explicitly capture dynamic cross-hand coordination through role-aware attention and temporal gating. To enable real-time streaming generation for arbitrarily long sequences, we design an autoregressive flow continuation scheme that ensures seamless cross-chunk temporal coherence. Extensive experiments on the PianoMotion10M dataset demonstrate that PianoFlow achieves superior quantitative and qualitative performance, while accelerating inference by over 9\times compared to previous methods.