I Trained an AI to Beat Final Fight… Here’s What Happened [p]

Reddit r/MachineLearning / 5/4/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical UsageModels & Research

共有:

Key Points

The author trained an AI agent for the arcade game Final Fight using behavior cloning from demonstrations, then evaluated its progress in the first stage.
They encountered several practical training and evaluation issues, including action-space remapping from a MultiBinary representation to emulator inputs, trajectory alignment/offset bugs, and different LSTM behavior between evaluation and manual rollouts.
The agent shows some ability to make progress but struggles with consistency and survival, indicating limits of pure imitation learning in this setting.
The author plans to extend the approach with GAIL plus PPO to improve performance beyond imitation and is seeking community advice on limited-data BC, transitioning from BC to PPO, and handling partial observability.
Code and results are shared via a GitHub repository so others can review the full experimental pipeline.

I Trained an AI to Beat Final Fight… Here’s What Happened [p]

Hey everyone,

I’ve been experimenting with Behavior Cloning on a classic arcade game (Final Fight), and I wanted to share the results and get some feedback from the community.

The setup is fairly simple: I trained an agent purely from demonstrations (no reward shaping initially), then evaluated how far it could go in the first stage. I also plan to extend this with GAIL + PPO to see how much performance improves beyond imitation.

A couple of interesting challenges came up:

Action space remapping (MultiBinary → emulator input)
Trajectory alignment issues (obs/action offset bugs 😅)
LSTM policy behaving differently under evaluation vs manual rollout
Managing rollouts efficiently without loading everything into memory

The agent can already make some progress, but still struggles with consistency and survival.

I’d love to hear thoughts on:

Improving BC performance with limited trajectories
Best practices for transitioning BC → PPO
Handling partial observability in these environments

Here’s the code if you want to see the full process and results:
notebooks-rl/final_fight at main · paulo101977/notebooks-rl

Any feedback is very welcome!

submitted by /u/AgeOfEmpires4AOE4
[link] [comments]

Black Hat USA

AI Business

Sparse Federated Representation Learning for deep-sea exploration habitat design in carbon-negative infrastructure

Dev.to

Building a daily AI news brief in 325 lines of Python

Dev.to

Signal Lock: Closing the Prediction-Execution Gap in Agentic AI Systems

Reddit r/artificial

A Developer’s Guide to Systematic Prompting: Mastering Negative Constraints, Structured JSON Outputs, and Multi-Hypothesis Verbalized Sampling

MarkTechPost

I Trained an AI to Beat Final Fight… Here’s What Happened [p]

Key Points

Related Articles

Black Hat USA

Sparse Federated Representation Learning for deep-sea exploration habitat design in carbon-negative infrastructure

Building a daily AI news brief in 325 lines of Python

Signal Lock: Closing the Prediction-Execution Gap in Agentic AI Systems

A Developer’s Guide to Systematic Prompting: Mastering Negative Constraints, Structured JSON Outputs, and Multi-Hypothesis Verbalized Sampling

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer