Investigating Data Interventions for Subgroup Fairness: An ICU Case Study
arXiv cs.LG / 4/7/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how “data fixing” interventions can fail or backfire when training data comes from multiple sources with distribution shifts, leading to volatile subgroup fairness outcomes.
- Using an ICU/healthcare setting with EHR-derived datasets (eICU Collaborative Research Database and MIMIC-IV), the authors find that adding data can both improve and worsen both subgroup fairness and overall performance.
- The research shows that many intuitive data-selection strategies are unreliable for subgroup outcomes, especially when added data introduces new biases or shifts.
- It compares data-centric addition approaches with model-based post-hoc calibration and concludes that combining both is important for improving subgroup performance.
- The findings challenge the common belief that “better data alone” is sufficient to address fairness problems in machine-learning decision systems.
Related Articles

Meta Superintelligence Lab Releases Muse Spark: A Multimodal Reasoning Model With Thought Compression and Parallel Agents
MarkTechPost

Chatbots are great at manipulating people to buy stuff, Princeton boffins find
The Register

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.
Dev.to
v0.20.5
Ollama Releases

Charades-Ego: A Large-Scale Dataset of Paired Third and First Person Videos
Dev.to