DynamicPO: Dynamic Preference Optimization for Recommendation
arXiv cs.AI / 5/4/2026
💬 OpinionTools & Practical UsageModels & Research
Key Points
- The paper shows that in LLM-based recommendation, increasing the number of negative samples in direct preference optimization (DPO) can paradoxically degrade performance even as training loss keeps decreasing.
- It attributes this “preference optimization collapse” to gradient suppression, where overly easy negatives dominate and boundary-critical negatives that define user preference boundaries receive too little optimization.
- To address the issue, the authors propose DynamicPO, a plug-and-play framework with Dynamic Boundary Negative Selection to prioritize informative near-boundary negatives.
- DynamicPO also introduces Dual-Margin Dynamic beta Adjustment to vary optimization strength per sample based on boundary ambiguity.
- Experiments on three public datasets indicate DynamicPO prevents optimization collapse and improves recommendation accuracy with negligible computational overhead, and the code is publicly released.
Related Articles

Black Hat USA
AI Business
A very basic litmus test for LLMs "ok give me a python program that reads my c: and put names and folders in a sorted list from biggest to small"
Reddit r/LocalLLaMA

ALM on Power Platform: ADO + GitHub, the best of both worlds
Dev.to

Iron Will, Iron Problems: Kiwi-chan's Mining Misadventures! 🥝⛏️
Dev.to

Experiment: Does repeated usage influence ChatGPT 5.4 outputs in a RAG-like setup?
Dev.to