Extending OpenAI Baselines to Integrate Pygame Learning Environment

As I mentioned in my last post on implementing a DQN in Keras, there are better resources to use than coding a DQN from scratch. One of those better resources is OpenAI baselines. Included in the package are some built in examples for using a DQN to play cartpole and Atari. Using those as a starting point, I wrote some code to extend baselines so I could run some advanced DQNs on the full Pygame Learning Environment (PLE).

The Code
See my fork of baselines here. Two files received some minor modifications and I added two files: one to run the DQN training and one to view the DQN results in a game environment.

run_ple_dqn.py is what is called from the command line to start training. Like the baselines equivalent of run_atari, it creates an environment, feeds in various arguments (I expanded the arguments), calls simple.py to train the model, and then saves the model. enjoy_ple_dqn.py runs the saved model so that you can see the results.

For example to run Catcher and then see the results of the Catcher model run these lines from the command line:
    python -m baselines.deepq.experiments.run_ple_dqn  --env=catcher --num-timesteps=1000000
    python -m baselines.deepq.experiments.enjoy_ple_dqn  --env=catcher

The last file of note is simple.py which has the PLE specific training function learn_ple(). Compare with learn() to see the different ways the PLE environment is handled from a standard gym environment and the expanded options for the learn_ple() DQN.

Games and Results
I basically did the default options and with no hyperparamter search and was able to achieve good results on Catcher, Pong, Pixelcopter, and Snake. I trained the games on my GPU with each game taking about 8 to 24 hours and running for 1,000,000 to 4,000,000 steps. Logged results are available here. The actual saved models that can be loaded to run demonstrations are around 90 MB a piece so not GitHub friendly but I can make them available elsewhere if anyone is interested. Waterworld achieved meh results. FlappyBird achieved poor results which could probably be improved with better hyperparameter search. In particular, I think it needed another 1 million steps as it was just starting to show improvement when the model finished.

Raycast Maze and Puckworld, for game specific reasons, did not work at all. Raycast Maze for example had no concept of time. It averaged max reward per episode from the very beginning (since it always eventually found the maze exit, despite it taking many steps). Baselines was never able to save improvements because baselines wasn't able to acknowledge any improvement over the max reward per episode. Monster Kong I'm saving for my next experiment.


Up Next
I'm going to do use these libraries and do a Monster Kong (Donkey Kong) DQN with hyperparameter search in a Google Cloud environment.

Comments