DeepCausalMMM: A Deep Learning Framework for Marketing Mix Modeling with Causal Structure Learning

arXiv stat.ML / 4/28/2026

💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research

Key Points

  • DeepCausalMMM is a new deep-learning-based Marketing Mix Modeling (MMM) framework that combines causal inference and marketing science to better capture temporal effects, non-linearities, and inter-channel dependencies.
  • The model uses GRUs to learn time dynamics such as adstock and lag, while learning causal relationships between channels via a DAG whose structure is constrained to be upper triangular.
  • It incorporates Hill-equation saturation curves to model diminishing returns, and includes budget optimization to translate response estimates into actionable planning.
  • The framework emphasizes practicality through data-driven hyperparameters (with sensible defaults), configurable attribution priors with dynamic loss scaling, Huber loss robustness, and support for multi-region modeling with shared and region-specific parameters.
  • It also provides response curve analysis capabilities to evaluate and interpret how each channel affects the target business outcome over time and at different spend levels.

Abstract

Marketing Mix Modeling (MMM) estimates the impact of marketing activities on business outcomes such as sales or revenue. Traditional MMM approaches rely on linear regression or Bayesian hierarchical models that assume channel independence and struggle to capture temporal dynamics and non-linear saturation. DeepCausalMMM addresses these limitations by combining deep learning, causal inference, and marketing science. It uses Gated Recurrent Units (GRUs) to learn temporal patterns (adstock, lag) while learning statistical dependencies between channels through Directed Acyclic Graph (DAG) structure with upper triangular constraints. It implements Hill equation saturation curves for diminishing returns and budget optimization. Key features: (1) data-driven hyperparameters learned from data with defaults, (2) linear mean scaling of the dependent variable, (3) configurable attribution priors with dynamic loss scaling, (4) multi-region modeling with shared and region-specific parameters, (5) robust methods including Huber loss, (6) response curve analysis.