Reinforcement Learning

Reinforcement Learning

Lets start looking into this topic with a little example-
Suppose there is a robot which is under the process of learning.
It takes a big step forward, but looses it balance and falls down. Robot tries again a little more bigger step and falls down again. Now robot make extra look into solve his falling problem and it takes a smaller step . And wow, now it is able to hold its balance.
The robot tries lot of variation of steps for many times and in the end it learns the right size of steps to move steadily

And congratulations! it has succeeded

The above is an example of reinforcement learning. Instead of learning the complex relation between action and outcome, here, robot is connecting the action with a outcome. Reward (keeping itself balanced) and punishment(falling), here are being the basis of learning. This feedback is considered "reinforcement" for doing or not doing an action.

Lets discuss one more example, board based game Go

If you are not familiar with this game you can learn more about it https://en.wikipedia.org/wiki/Go_(game)

In short, From Wikipedia, go is an abstract strategy board game for two players, in which the aim is to surround more territory than the opponent.

If the computer player puts down its black piece at a location, then gets surrounded by the white pieces and loses that space, it is punished for taking such a move. After being beaten a few times, the computer player will avoid putting the white piece in that location when black pieces are around.

Surrounded

In the simplistic way, reinforcement learning can be defined as :

Reinforcement learning is learning of the best actions based upon rewards and punishments

There are basically three concepts in the Reinforcement learning

State
Action
Reward