Causal Representation Learning on High-Dimensional Data: Benchmarks, Reproducibility, and Evaluation Metrics
arXiv cs.LG / 3/19/2026
📰 NewsIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The paper critically analyzes existing synthetic and real-world CRL datasets, highlighting their limitations and proposing essential characteristics for suitable CRL benchmarks.
- It discusses how CRL evaluation spans reconstruction, disentanglement, causal discovery, and counterfactual reasoning, which complicates cross-model comparisons.
- It introduces a single aggregate metric that combines performance across evaluation directions to provide a comprehensive score for CRL models.
- It reviews published implementations to assess reproducibility and suggests best practices to improve repeatability across experiments.