HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning
arXiv cs.CV / 3/19/2026
📰 NewsModels & Research
Key Points
- HopChain presents a scalable data-synthesis framework to create multi-hop vision-language reasoning data for RLVR training of VLMs.
- The method builds logically dependent chains of hops and yields final answers that are precise numbers for verifiable rewards, addressing long-CoT reasoning and related errors.
- Empirically, adding HopChain data improves 20 of 24 benchmarks across models and tasks (STEM, General VQA, Text Recognition, Document Understanding, Video Understanding).
- Ablations show that removing or shortening the hops reduces performance significantly, while full multi-hop data yields large gains, including gains of more than 50 accuracy points in the ultra-long-CoT regime, supporting broad generalizability.
Related Articles
Co-Activation Pattern Detection for Prompt Injection: A Mechanistic Interpretability Approach Using Sparse Autoencoders
Reddit r/LocalLLaMA

How to Train Custom Language Models: Fine-Tuning vs Training From Scratch (2026)
Dev.to

KoboldCpp 1.110 - 3 YR Anniversary Edition, native music gen, qwen3tts voice cloning and more
Reddit r/LocalLLaMA
Qwen3.5 Knowledge density and performance
Reddit r/LocalLLaMA
I think I made the best general use System Prompt for Qwen 3.5 (OpenWebUI + Web search)
Reddit r/LocalLLaMA