One Token, Two Fates: A Unified Framework via Vision Token Manipulation Against MLLMs Hallucination
arXiv cs.CV / 3/12/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper critiques existing training-free methods for reducing MLLM hallucination, noting that improving vision or suppressing language priors alone trades off performance and can introduce noise.
- It proposes a unified framework focused on vision tokens, built around two latent-representation modules: Synergistic Visual Calibration (SVC) and Causal Representation Calibration (CRC).
- SVC uses augmented visual tokens to strengthen visuals, while CRC prunes tokens to create latent-space negative samples for correcting internal model biases.
- The approach restores the vision-language balance and demonstrates about 2% absolute POPE improvement on LLaVA-1.5 across multiple benchmarks, with a 1.06x inference latency overhead.
Related Articles

Interactive Web Visualization of GPT-2
Reddit r/artificial
Stop Treating AI Interview Fraud Like a Proctoring Problem
Dev.to
[R] Causal self-attention as a probabilistic model over embeddings
Reddit r/MachineLearning
The 5 software development trends that actually matter in 2026 (and what they mean for your startup)
Dev.to
InVideo AI Review: Fast Finished
Dev.to