| Hey everyone, I’ve been experimenting with Behavior Cloning on a classic arcade game (Final Fight), and I wanted to share the results and get some feedback from the community. The setup is fairly simple: I trained an agent purely from demonstrations (no reward shaping initially), then evaluated how far it could go in the first stage. I also plan to extend this with GAIL + PPO to see how much performance improves beyond imitation. A couple of interesting challenges came up:
The agent can already make some progress, but still struggles with consistency and survival. I’d love to hear thoughts on:
Here’s the code if you want to see the full process and results: Any feedback is very welcome! [link] [comments] |
I Trained an AI to Beat Final Fight… Here’s What Happened [p]
Reddit r/MachineLearning / 5/4/2026
💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research
Key Points
- The author trained an AI agent for the arcade game Final Fight using behavior cloning from demonstrations, then evaluated its progress in the first stage.
- They encountered several practical training and evaluation issues, including action-space remapping from a MultiBinary representation to emulator inputs, trajectory alignment/offset bugs, and different LSTM behavior between evaluation and manual rollouts.
- The agent shows some ability to make progress but struggles with consistency and survival, indicating limits of pure imitation learning in this setting.
- The author plans to extend the approach with GAIL plus PPO to improve performance beyond imitation and is seeking community advice on limited-data BC, transitioning from BC to PPO, and handling partial observability.
- Code and results are shared via a GitHub repository so others can review the full experimental pipeline.
Related Articles

Black Hat USA
AI Business
Sparse Federated Representation Learning for deep-sea exploration habitat design in carbon-negative infrastructure
Dev.to

Building a daily AI news brief in 325 lines of Python
Dev.to

Signal Lock: Closing the Prediction-Execution Gap in Agentic AI Systems
Reddit r/artificial

A Developer’s Guide to Systematic Prompting: Mastering Negative Constraints, Structured JSON Outputs, and Multi-Hypothesis Verbalized Sampling
MarkTechPost