Bivariate Causal Discovery Using Rate-Distortion MDL: An Information Dimension Approach

arXiv cs.LG / 4/8/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

Key Points

  • The paper introduces rate-distortion MDL (RDMDL), a new bivariate causal discovery method built on the minimum description length (MDL) principle and rate-distortion theory.
  • It argues that prior MDL-based causal discovery approaches mis-estimate the description length contribution from the cause variable, causing the direction decision to be driven too heavily by the causal mechanism.
  • RDMDL estimates the cause’s description length via a minimum rate required to reach a distortion level inferred from histogram-based density estimation.
  • The method computes the rate using an information-dimension-based asymptotic approximation and combines it with a conventional description-length approach for the causal mechanism.
  • Experiments on the Tübingen dataset show RDMDL achieves competitive results, and the authors provide publicly available code and experiments.

Abstract

Approaches to bivariate causal discovery based on the minimum description length (MDL) principle approximate the (uncomputable) Kolmogorov complexity of the models in each causal direction, selecting the one with the lower total complexity. The premise is that nature's mechanisms are simpler in their true causal order. Inherently, the description length (complexity) in each direction includes the description of the cause variable and that of the causal mechanism. In this work, we argue that current state-of-the-art MDL-based methods do not correctly address the problem of estimating the description length of the cause variable, effectively leaving the decision to the description length of the causal mechanism. Based on rate-distortion theory, we propose a new way to measure the description length of the cause, corresponding to the minimum rate required to achieve a distortion level representative of the underlying distribution. This distortion level is deduced using rules from histogram-based density estimation, while the rate is computed using the related concept of information dimension, based on an asymptotic approximation. Combining it with a traditional approach for the causal mechanism, we introduce a new bivariate causal discovery method, termed rate-distortion MDL (RDMDL). We show experimentally that RDMDL achieves competitive performance on the T\"ubingen dataset. All the code and experiments are publicly available at github.com/tiagobrogueira/Causal-Discovery-In-Exchangeable-Data.