| Code of Project: https://github.com/paulo101977/notebooks-rl/tree/main/re_requiem I’ve been working on training an agent to play a segment of Resident Evil Requiem, focusing on a fast-paced, semi-linear escape sequence with enemies and time pressure. Instead of going fully reinforcement learning from scratch, I used a hybrid approach:
The environment is based on gameplay capture, where I map controller inputs into a discretized action space. Observations are extracted directly from frames (with some preprocessing), and the agent learns to mimic and then refine behavior over time. One of the main challenges was the instability early on — especially when the agent deviates slightly from the demonstrated trajectories (classic BC issue). HG-DAgger helped a lot by correcting those off-distribution states. Another tricky part was synchronizing actions with what’s actually happening on screen, since even small timing mismatches can completely break learning in this kind of game. After training, the agent is able to:
I’m still experimenting with improving robustness and generalization (right now it’s quite specialized to this segment). Happy to share more details (training setup, preprocessing, action space, etc.) if anyone’s interested. [link] [comments] |
Training an AI to play Resident Evil Requiem using Behavior Cloning + HG-DAgge [P]
Reddit r/MachineLearning / 4/12/2026
💬 OpinionIdeas & Deep AnalysisTools & Practical UsageModels & Research
Key Points
- The article describes a project training an AI agent to play a segment of *Resident Evil Requiem* using a hybrid imitation-learning approach rather than full reinforcement learning from scratch.
- The agent first learns an initial policy from human gameplay demonstrations via Behavior Cloning, then improves iteratively with HG-DAgger to reduce compounding errors from off-distribution states.
- The training pipeline uses gameplay capture, discretizes controller inputs into an action space, and extracts observations directly from video frames with preprocessing.
- Key challenges highlighted include instability early in training when the agent deviates from demonstrations (a classic BC failure mode) and the difficulty of synchronizing action timing with on-screen events.
- After training, the agent can navigate the target escape sequence more consistently, react to enemies in real time, and recover somewhat from small deviations, though it remains specialized with limited generalization.
Related Articles

Black Hat USA
AI Business

Black Hat Asia
AI Business

Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot
VentureBeat

ChatGPT Prompt Engineering for Freelancers: A Step-by-Step Guide to Unlocking AI-Powered Client Acquisition
Dev.to

From Batch to Bot: AI for Specialty Food Label Compliance
Dev.to