HiPath: Hierarchical Vision-Language Alignment for Structured Pathology Report Prediction
arXiv cs.CV / 3/23/2026
📰 NewsModels & Research
Key Points
- HiPath is a lightweight vision-language model for predicting structured pathology reports and uses three trainable modules totaling 15M parameters built on frozen UNI2 and Qwen3 backbones.
- It introduces Hierarchical Patch Aggregator (HiPA) for multi-image visual encoding, Hierarchical Contrastive Learning (HiCL) for cross-modal alignment via optimal transport, and Slot-based Masked Diagnosis Prediction (Slot-MDP) for structured diagnosis generation.
- Trained on 749K real-world Chinese pathology cases from three hospitals, HiPath achieves 68.9% strict and 74.7% clinically acceptable accuracy with a 97.3% safety rate, outperforming baselines under the same frozen backbone.
- Cross-hospital evaluation shows generalization with only a 3.4 percentage-point drop in strict accuracy and 97.1% safety, indicating robustness across institutions.
- The work emphasizes structured report prediction as the primary training objective rather than flat labels or free-text outputs.
Related Articles
How political censorship actually works inside Qwen, DeepSeek, GLM, and Yi: Ablation and behavioral results across 9 models
Reddit r/LocalLLaMA

OpenSeeker's open-source approach aims to break up the data monopoly for AI search agents
THE DECODER

How to Choose the Best AI Chat Models of 2026 for Your Business Needs
Dev.to

I built an AI that generates lesson plans in your exact teaching voice (open source)
Dev.to

6-Band Prompt Decomposition: The Complete Technical Guide
Dev.to