I researched model-based learning this summer. Here are the results.


CartPole problem randomly pushing left or right at each timestep
  • position of the cart
  • velocity of the cart
  • angle between cart and pole
  • angular velocity of the pole

Model-based reinforcement learning

Inputs are image of the current state and action taken (left) and output is the image of the next timestep(right) of a hypothetical environment model

Building the model

Model exploration

How well did the model learn to represent the environment? See above paragraph for details.
Imagination rollouts pushing left (top) and right (bottom)

Beating the game

Comparison of different agents beating CartPole. Horizontal axis is training epochs (roughly equal to time), vertical axis is mean score for the last 100 games played. Training ends when mean score exceeds 195. Shorter is better.




