Mitigating Multimodal LLMs Hallucinations via Relevance Propagation at Inference Time
arXiv cs.LG / 5/5/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses hallucinations in multimodal LLMs by arguing that, during inference, over-reliance on textual tokens weakens grounding in perceptual inputs (vision/audio).
- It introduces LIME (Learning Inference-time Modality Enhancement), a training-free method that uses Layer-wise Relevance Propagation (LRP) to measure token-level contributions and drive the model toward higher perceptual reliance.
- LIME enforces its relevance-based goal via inference-time updates to key-value representations, without changing model parameters or requiring extra training data.
- Experiments on multiple vision and audio multimodal benchmarks show that LIME consistently reduces hallucinations and improves grounding while maintaining overall generation quality.
- The analysis indicates that LIME increases modality contribution and yields more localized, semantically aligned relevance patterns.
Related Articles

Why Retail Chargeback Recovery Could Be AgentHansa's First Real PMF
Dev.to

Why B2B Revenue-Recovery Casework Looks Like AgentHansa's Best Early PMF
Dev.to

10 Ways AI Has Become Your Invisible Daily Companion in 2026
Dev.to

When a Bottling Line Stops at 2 A.M., the Agent That Wins Is the One That Finds the Right Replacement Part
Dev.to

My ‘Busy’ Button Is a Chat Window: 8 Hours of Sorting & Broccoli Poetry
Dev.to