Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models
arXiv cs.AI / 4/21/2026
📰 NewsSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies a previously underexplored issue in native omni-modal LLMs: how the models develop a “modality preference” when text and vision are both available.
- It introduces a conflict-based benchmark and a modality selection rate metric to systematically quantify modality preference across ten representative OLLMs.
- The results show a shift from traditional VLM “text-dominance” to a strong visual preference in most omni-modal LLMs.
- Layer-wise probing indicates the preference is not fixed from the start, but gradually emerges in the mid-to-late layers.
- The authors use these internal signals to diagnose cross-modal hallucinations and report competitive results on three downstream multimodal benchmarks without task-specific training data.
Related Articles

¿Hasta qué punto podría la IA reemplazarnos en nuestros trabajos? A veces creo que la gente exagera un poco.
Reddit r/artificial

Why I Built byCode: A 100% Local, Privacy-First AI IDE
Dev.to

Magnificent irony as Meta staff unhappy about running surveillance software on work PCs
The Register

ETHENEA (ETHENEA Americas LLC) Analyst View: Asset Allocation Resilience in the 2026 Global Macro Cycle
Dev.to

Blaze Balance Engine SaaS
Dev.to