Deep Kuratowski Embedding Neural Networks for Wasserstein Metric Learning

arXiv cs.LG / 4/7/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces two neural architectures, DeepKENN and ODE-KENN, designed to approximate the Wasserstein-2 (W2) distance for Wasserstein metric learning without computing exact pairwise distances directly.
  • DeepKENN learns a weighted aggregation of distances computed across intermediate CNN feature maps using learnable positive weights.
  • ODE-KENN replaces the discrete CNN layer stack with a Neural ODE, embedding data into a continuous function space (trajectories) to provide implicit regularization through trajectory smoothness.
  • Experiments on MNIST using precomputed exact W2 distances show ODE-KENN improves test mean-squared error by 28% over a single-layer baseline and by 18% over DeepKENN when parameter counts are matched.
  • The authors argue the learned surrogate can serve as a fast replacement for an expensive W2 “oracle” in downstream pairwise distance computations.

Abstract

Computing pairwise Wasserstein distances is a fundamental bottleneck in data analysis pipelines. Motivated by the classical Kuratowski embedding theorem, we propose two neural architectures for learning to approximate the Wasserstein-2 distance (W_2) from data. The first, DeepKENN, aggregates distances across all intermediate feature maps of a CNN using learnable positive weights. The second, ODE-KENN, replaces the discrete layer stack with a Neural ODE, embedding each input into the infinite-dimensional Banach space C^1([0,1], \mathbb{R}^d) and providing implicit regularization via trajectory smoothness. Experiments on MNIST with exact precomputed W_2 distances show that ODE-KENN achieves a 28% lower test MSE than the single-layer baseline and 18% lower than DeepKENN under matched parameter counts, while exhibiting a smaller generalization gap. The resulting fast surrogate can replace the expensive W_2 oracle in downstream pairwise distance computations.