High-dimensional Many-to-many-to-many Mediation Analysis

arXiv stat.ML / 4/6/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • The paper introduces a “many-to-many-to-many” (MMM) mediation analysis framework for settings where exposures, mediators, and outcomes are all multivariate and can be high-dimensional simultaneously.
  • MMM mediation jointly performs variable selection, estimates an indirect-effect matrix capturing exposure→mediator and mediator→outcome pathways, and supports prediction of multivariate outcomes.
  • The authors provide theoretical guarantees, showing consistency and element-wise asymptotic normality of the estimated indirect effect matrices, along with derived estimation error bounds.
  • Simulation studies assess finite-sample performance, convergence behavior, the quality of asymptotic approximations under noise, and overall robustness.
  • An application to ADNI data analyzes how cortical thickness across 202 brain regions mediates effects of 688 selected SNPs on 11 cognitive/diagnostic outcomes, improving interpretability and out-of-sample classification/prediction, with code released as an MMM-Mediation package.

Abstract

We study high-dimensional mediation analysis in which exposures, mediators, and outcomes are all multivariate, and both exposures and mediators may be high-dimensional. We formalize this as a many (exposures)-to-many (mediators)-to-many (outcomes) (MMM) mediation analysis problem. Methodologically, MMM mediation analysis simultaneously performs variable selection for high-dimensional exposures and mediators, estimates the indirect effect matrix (i.e., the coefficient matrices linking exposure-to-mediator and mediator-to-outcome pathways), and enables prediction of multivariate outcomes. Theoretically, we show that the estimated indirect effect matrices are consistent and element-wise asymptotically normal, and we derive error bounds for the estimators. To evaluate the efficacy of the MMM mediation framework, we first investigate its finite-sample performance, including convergence properties, the behavior of the asymptotic approximations, and robustness to noise, via simulation studies. We then apply MMM mediation analysis to data from the Alzheimer's Disease Neuroimaging Initiative to study how cortical thickness of 202 brain regions may mediate the effects of 688 genome-wide significant single nucleotide polymorphisms (SNPs) (selected from approximately 1.5 million SNPs) on eleven cognitive-behavioral and diagnostic outcomes. The MMM mediation framework identifies biologically interpretable, many-to-many-to-many genetic-neural-cognitive pathways and improves downstream out-of-sample classification and prediction performance. Taken together, our results demonstrate the potential of MMM mediation analysis and highlight the value of statistical methodology for investigating complex, high-dimensional multi-layer pathways in science. The MMM package is available at https://github.com/THELabTop/MMM-Mediation.