When Can You Poison Rewards? A Tight Characterization of Reward Poisoning in Linear MDPs
arXiv cs.LG / 4/14/2026
💬 OpinionIdeas & Deep AnalysisModels & Research
Key Points
- The paper analyzes “reward poisoning” in reinforcement learning, where an adversary alters rewards within a limited budget to steer an agent toward attacker-chosen behaviors.
- It provides the first tight necessity-and-sufficiency characterization of when reward poisoning is attackable in linear MDPs, separating vulnerable instances from intrinsically robust ones.
- The authors establish a “bright line” indicating which RL settings cannot be targeted effectively without prohibitive attack costs, even if the agent uses standard (non-robust) RL algorithms.
- Beyond linear MDPs, the work argues that approximating deep RL environments as linear MDPs can make the framework general enough to both distinguish vulnerability and efficiently attack susceptible environments in practice.
Related Articles

Don't forget, there is more than forgetting: new metrics for Continual Learning
Dev.to

Microsoft MAI-Image-2-Efficient Review 2026: The AI Image Model Built for Production Scale
Dev.to
Bit of a strange question?
Reddit r/artificial

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card
Dev.to

One URL for Your AI Agent: HTML, JSON, Markdown, and an A2A Card
Dev.to