PhysMem: Scaling Test-time Physical Memory for Robot Manipulation

arXiv cs.RO / 4/22/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces PhysMem, a memory framework that helps VLM-based robot planners learn object-specific physical behavior during test time without updating model parameters.
PhysMem stores interaction experiences, generates candidate physical hypotheses, and validates them through targeted experiments before using the knowledge for future planning.
The key design principle is “verification before application,” which reduces over-reliance on previously retrieved experiences when friction, stability, or other conditions shift.
Experiments on three real-world manipulation tasks and multiple simulation benchmarks across four VLM backbones show large gains, including 76% success on a brick insertion task versus 23% for direct experience retrieval.
Real-robot deployments demonstrate consistent improvement over 30-minute deployment sessions, indicating practical effectiveness of the test-time interaction loop.

Abstract

Reliable object manipulation requires understanding physical properties that vary across objects and environments. Vision-language model (VLM) planners can reason about friction and stability in general terms; however, they often cannot predict how a specific ball will roll on a particular surface or which stone will provide a stable foundation without direct experience. We present PhysMem, a memory framework that enables VLM robot planners to learn physical principles from interaction at test time, without updating model parameters. The system records experiences, generates candidate hypotheses, and verifies them through targeted interaction before promoting validated knowledge to guide future decisions. A central design choice is verification before application: the system tests hypotheses against new observations rather than applying retrieved experience directly, reducing rigid reliance on prior experience when physical conditions change. We evaluate PhysMem on three real-world manipulation tasks and simulation benchmarks across four VLM backbones. On a controlled brick insertion task, principled abstraction achieves 76% success compared to 23% for direct experience retrieval, and real-world experiments show consistent improvement over 30-minute deployment sessions.

Autoencoders and Representation Learning in Vision

Dev.to

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Dev.to

Context Bloat in AI Agents

Dev.to

We open sourced the AI dev team that builds our product

Dev.to

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B

Reddit r/LocalLLaMA

PhysMem: Scaling Test-time Physical Memory for Robot Manipulation

Key Points

Abstract

Related Articles

Autoencoders and Representation Learning in Vision

Google Stitch 2.0: Senior-Level UI in Seconds, But Editing Still Breaks

Context Bloat in AI Agents

We open sourced the AI dev team that builds our product

Qwen 3.6 35B A3B vs Qwen 3.5 122B A10B

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer