SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations

arXiv cs.LG / 4/30/2026

📰 NewsIdeas & Deep AnalysisModels & Research

共有:

Key Points

The paper introduces SWAN, a sample- and world-aware adaptive multimodal neural network designed to handle real-world runtime variations such as modality quality changes, input complexity shifts, and fluctuating compute resources.
SWAN combines a quality-aware controller (to allocate computation across modalities under a user-specified max budget), an adaptive gating module (to scale layer usage based on sample complexity), and a token-dropping module (to mask semantically irrelevant multimodal features) to improve compute efficiency.
The approach targets a key limitation of existing methods, which often fail to simultaneously respect strict compute budgets, account for input complexity, and adapt to multiple runtime factors.
Experiments in autonomous driving for complex multi-object 3D detection show up to a 49% reduction in FLOPs with minimal performance degradation.
The work positions SWAN as an early research advance toward more robust multimodal inference pipelines that maximize the value of compute spent under constraints.

Abstract

Multimodal deep neural networks deployed in realistic environments must contend with runtime variations: changes in modality quality, overall input complexity, and available platform resources. Current networks struggle with such fluctuations -- adaptive networks cannot adhere to a strict compute budget, controller-based networks neglect to consider input complexity, and statically provisioned networks fail at all the above. Consequently, they do not extract maximum utility from the expended computational resources. We present SWAN (Sample and World-Aware Multimodal Network), the first adaptive multimodal network that accomplishes all three goals. SWAN employs a quality-aware controller to assign resources among modalities according to a variable user-specified maximum budget. Within this budget, an adaptive gating module further optimizes efficiency by scaling layer utilization according to sample complexity. For further gains, SWAN also employs a token dropping module that masks semantically irrelevant multimodal features before performing detections. We evaluate SWAN in the domain of autonomous driving with complex multi-object 3D detection, reducing FLOPs by up to 49% with minimal degradation.

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Reddit r/MachineLearning

Agent Amnesia and the Case of Henry Molaison

Dev.to

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Dev.to

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Dev.to

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

Dev.to

SWAN: World-Aware Adaptive Multimodal Networks for Runtime Variations

Key Points

Abstract

Related Articles

Vector DB and ANN vs PHE conflict, is there a practical workaround? [D]

Agent Amnesia and the Case of Henry Molaison

Azure Weekly: Microsoft and OpenAI Restructure Partnership as GPT-5.5 Lands in Foundry

Proven Patterns for OpenAI Codex in 2026: Prompts, Validation, and Gateway Governance

Vibe coding is a tool, not a shortcut. Most people are using it wrong.

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer