CGCMA: Conditionally-Gated Cross-Modal Attention for Event-Conditioned Asynchronous Fusion

arXiv cs.LG / 4/21/2026

📰 NewsDeveloper Stack & InfrastructureSignals & Early TrendsModels & Research

共有:

Key Points

The paper studies asynchronous multimodal learning, where a continuous primary signal must be fused with delayed external context whose value depends on its arrival time and reliability.
It introduces CGCMA (Conditionally-Gated Cross-Modal Attention), which grounds event-relevant market states via text-attention and then uses a lag-aware gating mechanism to regulate (or suppress) residual cross-modal injection when web context is stale or contradictory.
The authors create CMI (Crypto Market Intelligence), an asynchronous evaluation dataset of 27,914 samples that pair high-frequency cryptocurrency price sequences with lagged real-news web intelligence.
On a short real-news evaluation set, CGCMA achieves the best mean downstream Sharpe ratio (+0.449 ± 0.257) under a shared zero-cost threshold-trading protocol, and ablations suggest the improvement is not due only to web scalar features or simple freshness heuristics.
Overall, the results provide evidence that asynchronous cross-modal fusion is a valid problem and that CGCMA yields promising gains on this stress-test setup.

Abstract

We study asynchronous alignment, a first-class multimodal learning setting in which a dense primary stream must be fused with sporadic external context whose value depends on when it arrives. Unlike standard multimodal benchmarks that assume structural synchrony, this setting requires models to reason explicitly about freshness and trust. We focus on the event-conditioned case in which continuous market states are paired with delayed web intelligence, and we use high-frequency cryptocurrency markets only as a timestamped, high-noise stress test for this broader problem. We propose CGCMA (Conditionally-Gated Cross-Modal Attention), whose central design principle is to separate text-conditioned grounding from lag-aware trust control. Text first attends over price sequences to identify event-relevant market states, after which a conditional gate uses modality agreement, web features, and lag

\tau_{\mathrm{lag}}

to regulate residual injection and fall back toward unimodal prediction when external context is stale or contradictory. We introduce CMI (Crypto Market Intelligence), an asynchronous evaluation corpus with 27,914 real-news samples pairing high-frequency price sequences with lagged web intelligence. On the current short real-news corpus, CGCMA attains the highest mean downstream Sharpe ratio (

+0.449 \pm 0.257

) among the evaluated baselines under a shared zero-cost threshold-trading evaluation on news-available bars. Additional controls show that the gain is not explained by web scalars alone and is not recovered by simple freshness heuristics. The resulting evidence supports problem validity and a promising asynchronous multimodal gain on this stress-test setting.

A practical guide to getting comfortable with AI coding tools

Dev.to

Competitive Map: 10 AI Agent Platforms vs AgentHansa

Dev.to

Every time a new model comes out, the old one is obsolete of course

Reddit r/LocalLLaMA

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Dev.to

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

Dev.to

CGCMA: Conditionally-Gated Cross-Modal Attention for Event-Conditioned Asynchronous Fusion

Key Points

Abstract

Related Articles

A practical guide to getting comfortable with AI coding tools

Competitive Map: 10 AI Agent Platforms vs AgentHansa

Every time a new model comes out, the old one is obsolete of course

We built it during the NVIDIA DGX Spark Full-Stack AI Hackathon — and it ended up winning 1st place overall 🏆

Stop Losing Progress: Setting Up a Pro Jupyter Workflow in VS Code (No More Colab Timeouts!)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer