AI Navigate

Worst-case low-rank approximations

arXiv cs.AI / 3/13/2026

📰 NewsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces wcPCA, a unified framework for worst-case performance of PCA across heterogeneous domains with distributional shifts.
  • It derives new estimators like norm-minPCA and norm-maxregret, tailored for scenarios with heterogeneous total variance.
  • It proves worst-case optimality over both observed source covariances and any target covariance in the convex hull of source covariances, with consistency for empirical estimators.
  • It extends to matrix completion and inductive matrix completion, with simulations and two real-world ecosystem-atmosphere flux applications showing improved worst-case performance with minor average loss.

Abstract

Real-world data in health, economics, and environmental sciences are often collected across heterogeneous domains (such as hospitals, regions, or time periods). In such settings, distributional shifts can make standard PCA unreliable, in that, for example, the leading principal components may explain substantially less variance in unseen domains than in the training domains. Existing approaches (such as FairPCA) have proposed to consider worst-case (rather than average) performance across multiple domains. This work develops a unified framework, called wcPCA, applies it to other objectives (resulting in the novel estimators such as norm-minPCA and norm-maxregret, which are better suited for applications with heterogeneous total variance) and analyzes their relationship. We prove that for all objectives, the estimators are worst-case optimal not only over the observed source domains but also over all target domains whose covariance lies in the convex hull of the (possibly normalized) source covariances. We establish consistency and asymptotic worst-case guarantees of empirical estimators. We extend our methodology to matrix completion, another problem that makes use of low-rank approximations, and prove approximate worst-case optimality for inductive matrix completion. Simulations and two real-world applications on ecosystem-atmosphere fluxes demonstrate marked improvements in worst-case performance, with only minor losses in average performance.