Towards Robust Endogenous Reasoning: Unifying Drift Adaptation in Non-Stationary Tuning
arXiv cs.LG / 4/20/2026
📰 NewsIdeas & Deep AnalysisModels & Research
Key Points
- The paper identifies a previously underexplored vulnerability in reinforcement fine-tuning of MLLMs: endogenous reasoning drift, where the model’s internal thinking and perception distributions change unpredictably during autoregressive generation even without external perturbations.
- It formalizes endogenous reasoning drift in RFT as multi-modal concept drift and proposes Counterfactual Preference Optimization++ (CPO++), which uses counterfactual reasoning plus domain knowledge to apply controlled perturbations to both thinking and perception.
- CPO++ leverages preference optimization to help separate spurious correlations, aiming to improve reasoning coherence and decision accuracy under non-stationary conditions.
- Experiments in two highly dynamic, safety-critical settings—medical diagnosis and autonomous driving—show improved performance and stronger robustness to extreme interference, along with strong zero-shot cross-domain generalization.
Related Articles

From Theory to Reality: Why Most AI Agent Projects Fail (And How Mine Did Too)
Dev.to

GPT-5.4-Cyber: OpenAI's Game-Changer for AI Security and Defensive AI
Dev.to

Building Digital Souls: The Brutal Reality of Creating AI That Understands You Like Nobody Else
Dev.to
Local LLM Beginner’s Guide (Mac - Apple Silicon)
Reddit r/artificial

Is Your Skill Actually Good? Systematically Validating Agent Skills with Evals
Dev.to