I researched model-based learning this summer. Here are the results.


CartPole problem randomly pushing left or right at each timestep
  • position of the cart
  • velocity of the cart
  • angle between cart and pole
  • angular velocity of the pole

Model-based reinforcement learning

Inputs are image of the current state and action taken (left) and output is the image of the next timestep(right) of a hypothetical environment model

Building the model

Model exploration

How well did the model learn to represent the environment? See above paragraph for details.
Imagination rollouts pushing left (top) and right (bottom)

Beating the game

Comparison of different agents beating CartPole. Horizontal axis is training epochs (roughly equal to time), vertical axis is mean score for the last 100 games played. Training ends when mean score exceeds 195. Shorter is better.




Software engineer, tinkerer, aspiring mad scientist

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Predict House Sales Price in King county, USA Using Linear Regression

Leveraging Word2vec for More than Text

How to adapt Machine Learning code for multiple numpy backends

Understanding your data: Principal Component Analysis

AI-enabled conversations with Analytics tables

A Quick Way to Learn XGBoost in Machine Learning?

WTF is Sensor Fusion? Laying the mathematical foundation

Meta-Modelling Meta-Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ben Mann

Ben Mann

Software engineer, tinkerer, aspiring mad scientist

More from Medium

Class of 2022 Beyond Graduation

Markerless Hand Tracking From A Single Video

AEM in the Metaverse

5 Key Transformative Mega Trends of Artificial Intelligence