Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE
arXiv cs.CL / 4/1/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper studies how Direct Preference Optimization (DPO) performs for multimodal sequential recommendation when implicit feedback makes unobserved items unreliable negatives.
- It finds that replacing deterministic hard negatives with stochastic sampling from a dynamic top-K candidate pool improves ranking consistently.
- The improvement is attributed to reducing harmful gradients from false negatives while preserving useful hard-signal information and smoothing training through controlled randomness.
- Using an optional sparse Mixture-of-Experts (MoE) encoder, the proposed RoDPO method reaches up to 5.25% NDCG@5 gains on three Amazon benchmarks with nearly unchanged inference cost.
Related Articles

Knowledge Governance For The Agentic Economy.
Dev.to

AI server farms heat up the neighborhood for miles around, paper finds
The Register
Does the Claude “leak” actually change anything in practice?
Reddit r/LocalLLaMA

87.4% of My Agent's Decisions Run on a 0.8B Model
Dev.to

AIエージェントをソフトウェアチームに変える無料ツール「Paperclip」
Dev.to