I researched model-based learning this summer. Here are the results.

CartPole

CartPole problem randomly pushing left or right at each timestep
  • position of the cart
  • velocity of the cart
  • angle between cart and pole
  • angular velocity of the pole

Model-based reinforcement learning

Inputs are image of the current state and action taken (left) and output is the image of the next timestep(right) of a hypothetical environment model

Building the model

We hypothesized that a model-based reinforcement learning agent could learn to solve CartPole faster than a similar model-free agent. In the rest of the post, I’ll show some fun pictures and the basic components of the research. I’m not covering the hundreds of experiments I had to run to get things working, since that’d take far too much space!

Model exploration

The first time I trained the model and hooked it up to a reinforcement learning agent, it completely failed to learn. To find out why, I did some visualizations to ensure it was learning what I wanted it to.

How well did the model learn to represent the environment? See above paragraph for details.
Imagination rollouts pushing left (top) and right (bottom)

Beating the game

Next, I generated baselines to compare against. Originally, I was using OpenAI/baselines DQN implementation and made very slow progress since it only used 1 CPU core at a time. I eventually switched to their A2C, which was much faster since it could use all 8 of my CPU cores, did intelligent batching, and had a better learning rule. It solved classic CartPole (4 numbers) in less than 30 seconds (cyan line below). The environment is solved when the mean score for the last 100 episodes is 195 or more out of a max of 200.

Comparison of different agents beating CartPole. Horizontal axis is training epochs (roughly equal to time), vertical axis is mean score for the last 100 games played. Training ends when mean score exceeds 195. Shorter is better.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store