On computing and the complexity of computing higher-order $U$-statistics, exactly

arXiv stat.ML / 4/1/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper addresses a gap in understanding the computational complexity of higher-order U-statistics by deriving results on how they can be computed more efficiently.
  • It provides a decomposition method that rewrites an m-th order U-statistic as a linear combination of lower/equal-order V-statistics, which are typically easier to compute.
  • It connects exact computation of V-statistics to Einstein summation, linking the problem to tensor-computation acceleration techniques used in computational mathematics and quantum computing.
  • Using graph/treewidth arguments, it gives an optimistic time-complexity estimate for exactly computing U-statistics and characterizes runtime behavior more systematically.
  • The authors release an open-source implementation (u-stats) in Python and R and report improved runtime performance on statistical examples versus existing benchmarks.

Abstract

Higher-order U-statistics abound in fields such as statistics, machine learning, and computer science, but are known to be highly time-consuming to compute in practice. Despite their widespread appearance, a comprehensive study of their computational complexity is surprisingly lacking. This paper aims to fill this gap by presenting several results related to the computational aspect of U-statistics. First, we derive a useful decomposition from a m-th order U-statistic to a linear combination of V-statistics with orders not exceeding m, which are generally more feasible to compute. Second, we explore the connection between exactly computing V-statistics and Einstein summation, a tool often used in computational mathematics and quantum computing to accelerate tensor computations. Third, we provide an optimistic estimate of the time complexity for exactly computing U-statistics, based on the treewidth of a particular graph associated with the U-statistic kernel. The above ingredients lead to (1) a new, much more runtime-efficient algorithm to exactly compute general higher-order U-statistics, and (2) a more streamlined characterization of runtime complexity of computing U-statistics. We develop an accompanying open-source package called \texttt{u-stats} in both Python (https://github.com/zrq1706/U-Statistics-Python) and R (https://github.com/cxy0714/U-Statistics-R). We demonstrate through three examples in statistics that \texttt{u-stats} achieves impressive runtime performance compared to existing benchmarks. This paper also aspires to achieve two goals: (1) to capture the interest of researchers in both statistics and other related areas to further advance the algorithmic development of U-statistics and (2) to lift the burden of implementing higher-order U-statistics from practitioners.