Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling
arXiv cs.CV / 4/7/2026
💬 OpinionSignals & Early TrendsIdeas & Deep AnalysisModels & Research
Key Points
- The paper addresses instability in unsupervised self-evolution for multimodal LLMs, arguing that majority-vote pseudo-labeling can reinforce intrinsic model biases rather than true correctness.
- It proposes CSRS, combining a retracing re-inference mechanism (RRM) from anchor points to better explore long-tail reasoning trajectories.
- CSRS introduces Softened Frequency Reward (SFR), using continuous, frequency-calibrated reward signals instead of binary feedback to reduce degradation during post-training.
- To prevent over-reliance on superficial multimodal cues, the method incorporates Visual Semantic Perturbation (VSP) to steer the model toward mathematical/logical reasoning.
- Experiments report significantly improved reasoning performance for Qwen2.5-VL-7B on benchmarks like MathVision and state-of-the-art results on geometric tasks, with code provided on GitHub.
Related Articles

Black Hat Asia
AI Business
v0.20.5
Ollama Releases

Inside Anthropic's Project Glasswing: The AI Model That Found Zero-Days in Every Major OS
Dev.to
Gemma 4 26B fabricated an entire code audit. I have the forensic evidence from the database.
Reddit r/LocalLLaMA

SoloEngine: Low-Code Agentic AI Development Platform with Native Support for Multi-Agent Collaboration, MCP, and Skill System
Dev.to