LIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observations

arXiv cs.CV / 5/4/2026

📰 NewsSignals & Early TrendsModels & Research

共有:

Key Points

The paper studies incomplete multimodal learning (IML) in a more realistic training setting where some modalities are missing at training time, removing the usual assumption of full-modal reconstruction supervision.
It introduces LIMSSR, which reframes incomplete multimodal prediction as a conditional sequence-to-score reasoning problem and uses LLM-guided context-aware modality imputation to infer latent semantics.
Instead of direct reconstruction, LIMSSR fuses multidimensional representations to learn from only the available modalities and related context.
To reduce hallucinations, the method uses a Mask-Aware Dual-Path Aggregation mechanism that dynamically calibrates inference uncertainty.
Experiments on three Action Quality Assessment datasets show LIMSSR significantly outperforms existing baselines while not requiring complete training data, and the authors provide released code.

Abstract

Real-world multimodal learning is often hindered by missing modalities. While Incomplete Multimodal Learning (IML) has gained traction, existing methods typically rely on the unrealistic assumption of full-modal availability during training to provide reconstruction supervision or cross-modal priors. This paper tackles the more challenging setting of IML under training-time incomplete observations, which precludes reliance on a ``God's eye view'' of complete data. We propose LIMSSR (LLM-Driven Incomplete Multimodal Sequence-to-Score Reasoning), a framework that reformulates this challenge as a conditional sequence reasoning task. LIMSSR leverages the semantic reasoning capabilities of Large Language Models via Prompt-Guided Context-Aware Modality Imputation and Multidimensional Representation Fusion to infer latent semantics from available contexts without direct reconstruction. To mitigate hallucinations, we introduce a Mask-Aware Dual-Path Aggregation to dynamically calibrate inference uncertainty. Extensive experiments on three Action Quality Assessment datasets demonstrate that LIMSSR significantly outperforms state-of-the-art baselines without relying on complete training data, establishing a new paradigm for data-efficient multimodal learning. Code is available at https://github.com/XuHuangbiao/LIMSSR.

ALM on Power Platform: ADO + GitHub, the best of both worlds

Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️

Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Dev.to

Open source models are going to be the future on Cursor, OpenCode etc.

Reddit r/LocalLLaMA

Claude Desktop + NFTs: MCP Tools for AI Agent NFT Management

Dev.to

LIMSSR: LLM-Driven Sequence-to-Score Reasoning under Training-Time Incomplete Multimodal Observations

Key Points

Abstract

Related Articles

ALM on Power Platform: ADO + GitHub, the best of both worlds

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?

Open source models are going to be the future on Cursor, OpenCode etc.

Claude Desktop + NFTs: MCP Tools for AI Agent NFT Management

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer