Batch-Adaptive Causal Annotations
arXiv stat.ML / 4/21/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper tackles efficient estimation of causal effects when outcomes are missing and when measurement error may not be standard, a common issue in policy and decision-making.
- It formulates an optimal batch sampling strategy that selects which data points to label for outcomes by minimizing the asymptotic variance of a doubly robust (AIPW/doubly robust) causal estimator.
- The authors derive a closed-form expression for the optimal batch sampling probability, improving efficiency in average treatment effect (ATE) estimation under missing outcomes.
- Extending the method to costly unstructured-data annotations (e.g., text and images) in healthcare and social services, experiments on simulated and real datasets—including homelessness street outreach interventions—show lower mean-squared error and fewer labels needed.
- In practice, the approach can reproduce confidence intervals from 361 random samples using only 90 optimized samples, cutting labeling cost by about 75%.
Related Articles

Every time a new model comes out, the old one is obsolete of course
Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆
Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)
Dev.to

Building AgentOS: Why I’m Building the AWS Lambda for Insurance Claims
Dev.to

Where we are. In a year, everything has changed. Kimi - Minimax - Qwen - Gemma - GLM
Reddit r/LocalLLaMA