Injecting Distributional Awareness into MLLMs via Reinforcement Learning for Deep Imbalanced Regression
arXiv cs.CL / 5/5/2026
📰 NewsDeveloper Stack & InfrastructureModels & Research
Key Points
- Multimodal large language models (MLLMs) often perform poorly on numerical regression with long-tailed (imbalanced) target distributions, tending to regress toward the mean due to biased token-level supervised fine-tuning.
- The paper identifies a key gap in existing training: insufficient cross-sample relational supervision that would let the model learn how predictions compare across a batch.
- It proposes a distribution-aware reinforcement learning approach using Group Relative Policy Optimization and a Concordance Correlation Coefficient-based reward to better match correlation, scale, and mean between predictions and targets.
- The method is plug-and-play, requiring no architectural changes, and it yields consistent gains on long-tailed regression benchmarks, especially in medium- and few-shot settings.
- Overall, the work suggests that batch-level, comparison-based learning signals can substantially improve MLLM numerical regression for imbalanced data.
Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to
Struggling with Qwen3.6 27B / 35B locally (3090) slow responses, breaking code looking for better setup + auto model switching
Reddit r/LocalLLaMA

Last Week in AI #340 - OpenAI vs Musk + Microsoft, DeepSeek v4, Vision Banana
Last Week in AI

Trying to train tiny LLMs on length constrained reddit posts summarization task using GRPO on 3xMac Minis - updates!
Reddit r/LocalLLaMA

Uber Shares What Happens When 1.500 AI Agents Hit Production
Reddit r/artificial