An Interdisciplinary and Cross-Task Review on Missing Data Imputation
arXiv stat.ML / 4/27/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- Missing data remains a major obstacle to analysis and decision-making across many domains, and the current research landscape is fragmented across fields and methods.
- The review bridges statistical foundations with modern machine learning by systematically covering missingness mechanisms, single vs. multiple imputation, imputation goals, and domain-specific problem characteristics.
- It categorizes imputation approaches from classical methods (e.g., regression, EM) to modern techniques including low/high-rank matrix completion, deep learning models (autoencoders, GANs, diffusion models, graph neural networks), and even large language model-based methods.
- Special emphasis is placed on handling complex data types (tensors, time series, streaming, graphs, categorical, and multimodal data) and on how imputation should integrate with downstream tasks like classification, clustering, and anomaly detection.
- The article also evaluates theoretical guarantees, benchmarks, and metrics, and highlights future challenges such as model/hyperparameter selection, privacy-preserving imputation via federated learning, and developing generalizable models across domains and data types.
Related Articles

Subagents: The Building Block of Agentic AI
Dev.to

DeepSeek-V4 Models Could Change Global AI Race
AI Business

Got OpenAI's privacy filter model running on-device via ExecuTorch
Reddit r/LocalLLaMA

The Agent-Skill Illusion: Why Prompt-Based Control Fails in Multi-Agent Business Consulting Systems
Dev.to
We Built a Voice AI Receptionist in 8 Weeks — Every Decision We Made and Why
Dev.to