Entire Space Counterfactual Learning for Reliable Content Recommendations

arXiv stat.ML / 3/26/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper addresses post-click CVR estimation for recommender systems, highlighting data sparsity and sample selection bias as key obstacles.
  • It critiques prior entire-space multitask approaches for two issues: intrinsic estimation bias (CVR overestimation) and false independence prior (missing causal dependence between clicks and later conversions).
  • The authors propose a model-agnostic framework, Entire Space Counterfactual Multitask Model (ESCM$^2$), which adds a counterfactual risk minimizer to ESMM to regularize CVR estimation.
  • Experiments on large-scale industrial datasets and an online industrial recommendation service show ESCM$^2$ reduces both IEB and FIP and improves overall recommendation performance.

Abstract

Post-click conversion rate (CVR) estimation is a fundamental task in developing effective recommender systems, yet it faces challenges from data sparsity and sample selection bias. To handle both challenges, the entire space multitask models are employed to decompose the user behavior track into a sequence of exposure \rightarrow click \rightarrow conversion, constructing surrogate learning tasks for CVR estimation. However, these methods suffer from two significant defects: (1) intrinsic estimation bias (IEB), where the CVR estimates are higher than the actual values; (2) false independence prior (FIP), where the causal relationship between clicks and subsequent conversions is potentially overlooked. To overcome these limitations, we develop a model-agnostic framework, namely Entire Space Counterfactual Multitask Model (ESCM^2), which incorporates a counterfactual risk minimizer within the ESMM framework to regularize CVR estimation. Experiments conducted on large-scale industrial recommendation datasets and an online industrial recommendation service demonstrate that ESCM^2 effectively mitigates IEB and FIP defects and substantially enhances recommendation performance.