PhysQuantAgent: An Inference Pipeline of Mass Estimation for Vision-Language Models
arXiv cs.CV / 3/19/2026
📰 NewsTools & Practical UsageModels & Research
Key Points
- Introduces PhysQuantAgent, a framework for real-world object mass estimation using vision-language models to inform grasp force and safe interaction in robotics.
- Presents VisPhysQuant, a new RGB-D video dataset annotated with precise mass measurements across multiple viewpoints for evaluating physical quantity estimation.
- Proposes three visual prompting methods that add object detection, scale estimation, and cross-sectional image generation to help the model understand size and internal structure.
- Experimental results show that visual prompting significantly improves mass estimation accuracy on real-world data, indicating the value of integrating spatial reasoning with VLM knowledge for physical inference.
Related Articles

Manus、AIエージェントをデスクトップ化 ローカルPC上でファイルやアプリを直接操作可能にのサムネイル画像
Ledge.ai

The programming passion is melting
Dev.to

Best AI Tools for Property Managers in 2026
Dev.to

Building “The Sentinel” – AI Parametric Insurance at Guidewire DEVTrails
Dev.to

Maximize Developer Revenue with Monetzly's Innovative API for AI Conversations
Dev.to