Task-Aware Scanning Parameter Configuration for Robotic Inspection Using Vision Language Embeddings and Hyperdimensional Computing

arXiv cs.CV / 5/6/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

Key Points

  • The paper targets robotic laser profiling where measurement quality is heavily influenced by sensor configuration parameters that are currently tuned by trial and error.
  • It proposes an instruction-conditioned approach that uses a pre-scan RGB observation plus a natural-language inspection instruction to recommend a discrete set of sensing parameters for a robot-mounted profiler.
  • To support evaluation, the authors introduce Instruct-Obs2Param, a real-world multimodal dataset connecting inspection intents with multi-view pose/illumination variation across 16 objects and canonical parameter regimes.
  • They present ScanHD, a hyperdimensional computing framework that binds the instruction and observation into task-aware codes and performs associative, parameter-wise reasoning for fast, interpretable, low-latency configuration decisions.
  • On Instruct-Obs2Param, ScanHD reports 92.7% average exact accuracy and 98.1% Win@1 accuracy across five parameters, outperforming heuristics and conventional multimodal and multimodal LLM baselines while improving generalization.

Abstract

Robotic laser profiling is widely used for dimensional verification and surface inspection, yet measurement fidelity is often dominated by sensor configuration rather than robot motion. Industrial profilers expose multiple coupled parameters, including sampling frequency, measurement range, exposure time, receiver dynamic range, and illumination, that are still tuned by trial-and-error; mismatches can cause saturation, clipping, or missing returns that cannot be recovered downstream. We formulate instruction-conditioned sensing parameter recommendation; given a pre-scan RGB observation and a natural-language inspection instruction, infer a discrete configuration over key parameters of a robot-mounted profiler. To benchmark this problem, we develop Instruct-Obs2Param, a real-world multimodal dataset linking inspection intents and multi-view pose and illumination variation across 16 objects to canonical parameter regimes. We then propose ScanHD, a hyperdimensional computing framework that binds instruction and observation into a task-aware code and performs parameter-wise associative reasoning with compact memories, matching discrete scanner regimes while yielding stable, interpretable, low-latency decisions. On Instruct-Obs2Param, ScanHD achieves 92.7% average exact accuracy and 98.1% average Win@1 accuracy across the five parameters, with strong cross-split generalization and low-latency inference suitable for deployment, outperforming rule-based heuristics, conventional multimodal models, and multimodal large language models. This work enables autonomous, instruction-conditioned sensing configuration from task intent and scene context, eliminating manual tuning and elevating sensor configuration from a static setting to an adaptive decision variable.