From 3D Pose to Prose: Biomechanics-Grounded Vision--Language Coaching

arXiv cs.CV / 3/31/2026

💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research

共有:

Key Points

BioCoach is introduced as a biomechanics-grounded vision–language framework that generates fitness coaching text from streaming video using 3D skeletal kinematics alongside visual appearance.
Its three-stage pipeline includes an exercise-specific degree-of-freedom selector, a structured biomechanical context that uses individualized morphometrics with cycle/constraint analysis, and a conditioned feedback module that uses cross-attention to produce precise actionable feedback.
The approach uses parameter-efficient training that freezes both the vision and language backbones, aiming for transparent, personalized reasoning instead of purely pattern-matching responses.
The paper adds the QEVD-bio-fit-coach benchmark (by augmenting QEVD-fit-coach with biomechanics-oriented feedback) and proposes a biomechanics-aware LLM judge metric for fair evaluation.
Results report improved coaching quality on QEVD-bio-fit-coach with gains in lexical and judgment metrics while maintaining temporal triggering, and also show text quality/correctness improvements on the original QEVD-fit-coach with near-parity timing.

Abstract

We present BioCoach, a biomechanics-grounded vision--language framework for fitness coaching from streaming video. BioCoach fuses visual appearance and 3D skeletal kinematics, through a novel three-stage pipeline: an exercise-specific degree-of-freedom selector that focuses analysis on salient joints; a structured biomechanical context that pairs individualized morphometrics with cycle and constraint analysis; and a vision--biomechanics conditioned feedback module that applies cross-attention to generate precise, actionable text. Using parameter-efficient training that freezes the vision and language backbones, BioCoach yields transparent, personalized reasoning rather than pattern matching. To enable learning and fair evaluation, we augment QEVD-fit-coach with biomechanics-oriented feedback to create QEVD-bio-fit-coach, and we introduce a biomechanics-aware LLM judge metric. BioCoach delivers clear gains on QEVD-bio-fit-coach across lexical and judgment metrics while maintaining temporal triggering; on the original QEVD-fit-coach, it improves text quality and correctness with near-parity timing, demonstrating that explicit kinematics and constraints are key to accurate, phase-aware coaching.

Black Hat Asia

AI Business

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Reddit r/MachineLearning

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

Dev.to

BYOK is not just a pricing model: why it changes AI product trust

Dev.to

AI Citation Registries and Identity Persistence Across Records

Dev.to

From 3D Pose to Prose: Biomechanics-Grounded Vision--Language Coaching

Key Points

Abstract

Related Articles

Black Hat Asia

[D] How does distributed proof of work computing handle the coordination needs of neural network training?

Claude Code's Entire Source Code Was Just Leaked via npm Source Maps — Here's What's Inside

BYOK is not just a pricing model: why it changes AI product trust

AI Citation Registries and Identity Persistence Across Records

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer