AI Navigate

GeMA: Learning Latent Manifold Frontiers for Benchmarking Complex Systems

arXiv cs.LG / 3/18/2026

📰 NewsModels & Research

Key Points

  • GeMA introduces Geometric Manifold Analysis (GeMA) implemented with a productivity-manifold variational autoencoder (ProMan-VAE) to represent production frontiers as boundaries of a low-dimensional latent manifold in the joint input-output space.
  • A split-head encoder learns latent variables that capture technological structure and operational inefficiency, enabling endogenous peer groups and scale-invariant benchmarking through a quotient construction.
  • Efficiency is measured relative to the learned manifold, with a local certification radius derived from the decoder Jacobian and a Lipschitz bound to quantify robustness.
  • The method is validated on synthetic data and four real-world case studies (global urban rail systems, British rail operators, Penn World Table economies, and wind-farm datasets), showing competitive performance with traditional frontier methods while providing new insights in heterogeneous, non-convex, or size-bias settings.

Abstract

Benchmarking the performance of complex systems such as rail networks, renewable generation assets and national economies is central to transport planning, regulation and macroeconomic analysis. Classical frontier methods, notably Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA), estimate an efficient frontier in the observed input-output space and define efficiency as distance to this frontier, but rely on restrictive assumptions on the production set and only indirectly address heterogeneity and scale effects. We propose Geometric Manifold Analysis (GeMA), a latent manifold frontier framework implemented via a productivity-manifold variational autoencoder (ProMan-VAE). Instead of specifying a frontier function in the observed space, GeMA represents the production set as the boundary of a low-dimensional manifold embedded in the joint input-output space. A split-head encoder learns latent variables that capture technological structure and operational inefficiency. Efficiency is evaluated with respect to the learned manifold, endogenous peer groups arise as clusters in latent technology space, a quotient construction supports scale-invariant benchmarking, and a local certification radius, derived from the decoder Jacobian and a Lipschitz bound, quantifies the geometric robustness of efficiency scores. We validate GeMA on synthetic data with non-convex frontiers, heterogeneous technologies and scale bias, and on four real-world case studies: global urban rail systems (COMET), British rail operators (ORR), national economies (Penn World Table) and a high-frequency wind-farm dataset. Across these domains GeMA behaves comparably to established methods when classical assumptions hold, and provides additional insight in settings with pronounced heterogeneity, non-convexity or size-related bias.