MOCA: A Transformer-based Modular Causal Inference Framework with One-way Cross-attention and Cutting Feedback

arXiv stat.ML / 4/28/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

MOCA (Modular One-way Causal Attention) is a transformer-based framework for estimating causal effects from observational data by more robustly handling confounding under complex, nonlinear, and high-dimensional treatment/outcome mechanisms.
The method uses a modular design that separates treatment and outcome modeling and applies one-way cross-attention to adjust for confounders while preserving causal directionality.
A “cutting-feedback” strategy implemented via gradient detachment prevents the outcome loss from updating the treatment module, avoiding undesirable information leakage into treatment-side representations.
Experiments on multiple simulated settings and two real-world benchmarks (Infant Health and Development Program and Dehejia–Wahba datasets) show competitive or improved performance versus established estimators and neural causal inference baselines like IPW/AIPW, X-learner, TARNet, and DragonNet.
The authors argue that modular attention with one-way information flow is a promising, more interpretable direction for combining causal inference with modern deep learning.

Abstract

Causal effect estimation from observational data requires careful adjustment for confounding. Classical estimators such as inverse probability weighting and augmented inverse probability weighting are effective under favorable model specification, but may become unstable when treatment assignment and outcome mechanisms are complex, non-linear, and high-dimensional. Machine learning and representation learning approaches improve flexibility, yet joint training can allow outcome-related information to influence treatment-side representations, which is undesirable from a causal perspective. We propose MOCA (Modular One-way Causal Attention), a transformer-based framework that separates treatment and outcome modeling through a modular design, and performs confounder adjustment using a one-way attention mechanism. A cutting-feedback strategy, implemented via gradient detachment, prevents the outcome loss from updating the treatment module. This design preserves directional information flow while retaining the representational power of transformer architectures for causal inference. Across multiple simulated scenarios, including linear, nonlinear, heavy-tailed, hidden confounding, and high-dimensional settings, MOCA shows competitive or improved performance relative to IPW, AIPW, X-learner, TARNet, and DragonNet. We further illustrate the method on the Infant Health and Development Program dataset and the Dehejia-Wahba dataset as real-world benchmarks. These results suggest that modular attention with one-way information flow provides a promising and interpretable direction for causal inference with modern deep learning models.

Write a 1,200-word blog post: "What is Generative Engine Optimization (GEO) and why SEO teams need it now"

Dev.to

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Dev.to

Most People Use AI Like Google. That's Why It Sucks.

Dev.to

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Dev.to

Tian AI vs ChatGPT: Why Local AI Is the Future of Privacy

Dev.to

MOCA: A Transformer-based Modular Causal Inference Framework with One-way Cross-attention and Cutting Feedback

Key Points

Abstract

Related Articles

Write a 1,200-word blog post: "What is Generative Engine Optimization (GEO) and why SEO teams need it now"

Indian Developers: How to Build AI Side Income with $0 Capital in 2026

Most People Use AI Like Google. That's Why It Sucks.

Behind the Scenes of a Self-Evolving AI: The Architecture of Tian AI

Tian AI vs ChatGPT: Why Local AI Is the Future of Privacy

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer