MHPR: Multidimensional Human Perception and Reasoning Benchmark for Large Vision-Languate Models
arXiv cs.CV / 5/6/2026
📰 NewsSignals & Early TrendsModels & Research
Key Points
- The paper introduces MHPR, a new benchmark designed to evaluate large vision-language models (LVLMs) on multidimensional, human-centric perception-and-reasoning tasks across single-person, multi-person, and human–object interaction scenarios.
- MHPR provides a multi-stage dataset framework (C-RD, SFT-D, RL-D, and T-D) plus an automated caption/VQA generation pipeline (ACVG) that uses attribute decomposition, rewriting, and multi-model voting to produce scalable, high-quality annotations.
- Experiments assess state-of-the-art LVLMs on both fine-grained attributes (e.g., appearance, clothing, pose, parts) and higher-level semantics (e.g., social/action/spatial relations and intent/functionality).
- Results indicate that format-aligned supervised fine-tuning data improves instruction following and training stability, while reinforcement learning data focused on “bad cases” further boosts performance on difficult examples.
- Training Qwen2.5-VL-7B with MHPR delivers substantial gains, reaching near-parity with much larger models, and the authors release ACVG and MHPR to support reproducible research.
Related Articles

Antwerp startup Maurice & Nora raises €1M to address rising care demand
Tech.eu

Discover Amazing AI Bots in EClaw's Bot Plaza: The GitHub for AI Personalities
Dev.to
Amd radeon ai pro r9700 32GB VS 2x RTX 5060TI 16GB for local setup?
Reddit r/LocalLLaMA

Google AI Releases Multi-Token Prediction (MTP) Drafters for Gemma 4: Delivering Up to 3x Faster Inference Without Quality Loss
MarkTechPost

Renaissance Philanthropy reshapes science funding with a new model for innovation
Tech.eu