FAIR_XAI: Improving Multimodal Foundation Model Fairness via Explainability for Wellbeing Assessment
arXiv cs.AI / 4/28/2026
📰 NewsModels & Research
Key Points
- The study examines how multimodal vision-language foundation models (VLMs) perform in wellbeing and depression assessment across both laboratory and naturalistic datasets, with particular attention to diagnostic reliability and demographic fairness.
- Results show large performance variation by environment and model architecture, with Phi3.5-Vision reaching 80.4% accuracy on E-DAIC versus Qwen2-VL at 33.9%, and both models tending to over-predict depression on AFAR-BSFT.
- Bias patterns differ by model: Qwen2-VL exhibits higher gender disparities, while Phi-3.5-Vision shows stronger racial bias across the evaluated settings.
- XAI-based fairness interventions produced mixed outcomes—prompting fairness achieved perfect equal opportunity for Qwen2-VL on E-DAIC but at a severe accuracy cost, while explainability interventions on AFAR-BSFT improved procedural consistency without ensuring outcome fairness and sometimes amplified racial bias.
- The authors conclude there is a persistent gap between procedural transparency (explainability) and equitable outcomes, recommending future fairness approaches jointly optimize predictive accuracy, demographic parity, and cross-domain generalization.
Related Articles
LLMs will be a commodity
Reddit r/artificial

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally
Reddit r/LocalLLaMA

Dex lands $5.3M to grow its AI-driven talent matching platform
Tech.eu
AI Voice Agents in Production: What Actually Works in 2026
Dev.to
How we built a browser-based AI Pathology platform
Dev.to