Soft-MSM: Differentiable Context-Aware Elastic Alignment for Time Series

arXiv cs.LG / 5/4/2026

📰 NewsTools & Practical UsageModels & Research

Key Points

  • Soft-MSM is introduced as a differentiable, context-aware elastic alignment loss that smooths the Move-Split-Merge (MSM) distance for gradient-based time series learning.
  • The method replaces MSM’s piecewise split/merge transition costs with a smooth gated surrogate, enabling gradients to flow through both the dynamic-programming recursion and the local, alignment-dependent transition structure.
  • The paper derives the forward/backward recursions, soft alignment matrix, closed-form gradients, and discusses limiting behavior and a divergence-corrected formulation.
  • Experiments on 112 UCR datasets show Soft-MSM achieves lower MSM barycentre loss than prior MSM barycentre methods and improves clustering and nearest-centroid classification versus Soft-DTW-based alternatives.
  • An open-source implementation is provided in the aeon toolkit, facilitating adoption in time-series ML workflows.

Abstract

Elastic distances like dynamic time warping (DTW) are central to time series machine learning because they compare sequences under local temporal misalignment. Soft-DTW is an adaptation of DTW that can be used as a gradient-based loss by replacing the hard minimum in its dynamic-programming recursion with a smooth relaxation. However, this approach does not directly extend to elastic distances whose transition costs depend on the local alignment context. Move-Split-Merge (MSM) is one such distance: it uses context-aware split and merge penalties and has often outperformed DTW in supervised and unsupervised time series machine learning tasks such as classification and clustering. We introduce Soft-MSM, a smooth relaxation of MSM and an elastic alignment loss with context-aware transition costs. Central to the formulation is a smooth gated surrogate for MSM's piecewise split/merge cost, which enables gradients through both the dynamic-programming recursion and the local transition structure. We derive the forward recursion, backward recursion, soft alignment matrix, closed-form gradient, limiting behaviour, and divergence-corrected formulation. Experiments on 112 UCR datasets show that Soft-MSM gives lower MSM barycentre loss than existing MSM barycentre methods, and yields significantly better clustering and nearest-centroid classification performance than Soft-DTW-based alternatives. An implementation is available in the open-source \texttt{aeon} toolkit.