Learning When to See and When to Feel: Adaptive Vision-Torque Fusion for Contact-Aware Manipulation
arXiv cs.RO / 4/3/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how to fuse vision and force/torque (F/T) signals in diffusion-based robotic manipulation policies, focusing on contact-rich tasks where vision alone is insufficient.
- It compares multiple existing integration strategies (e.g., auxiliary prediction objectives, mixture-of-experts, and contact-aware gating) to evaluate their relative effectiveness.
- The authors introduce an adaptive fusion method that suppresses F/T inputs during non-contact phases and switches to using both vision and torque information during contact.
- Experiments show the proposed adaptive approach improves success rate by 14% over the strongest baseline, underscoring the value of contact-aware multimodal fusion.
- Overall, the work provides both a benchmark-style comparison of F/T-vision fusion designs and a practical architectural idea for improving contact-aware manipulation.
Related Articles

Black Hat Asia
AI Business

Mistral raises $830M, 9fin hits unicorn status, and new Tech.eu Summit speakers unveiled
Tech.eu

ChatGPT costs $20/month. I built an alternative for $2.99.
Dev.to

OpenAI shifts to usage-based pricing for Codex in ChatGPT business plans
THE DECODER

Why I built an AI assistant that doesn't know who you are
Dev.to