The Language of Touch: Translating Vibrations into Text with Dual-Branch Learning
arXiv cs.CV / 3/31/2026
💬 OpinionSignals & Early TrendsModels & Research
Key Points
- The paper tackles vibrotactile captioning by generating natural-language descriptions directly from vibrotactile signals, addressing a key gap in semantic interpretation for haptic data.
- It introduces ViPAC, which uses a dual-branch learning strategy to separate periodic and aperiodic signal components and a dynamic fusion mechanism to integrate features adaptively.
- The method adds training constraints—an orthogonality constraint and weighting regularization—to improve feature complementarity and consistency in the fused representation.
- To enable evaluation, the authors build LMT108-CAP, the first vibrotactile-text paired dataset, generating multiple constrained captions per surface image using GPT-4o from the existing LMT-108 dataset.
- Experiments indicate ViPAC outperforms baseline approaches adapted from audio/image captioning, improving both lexical fidelity and semantic alignment between signals and generated text.



