Skip to main content

AI Simplified 1 : Reinforcement Learning

What is reinforcement learning?

Basically, it is learning from interaction and giving our agent a reward for achieving a goal. This field is essentially a class of problems with a class of solutions and a study of these classes. Of all the forms of Machine Learning, reinforcement learning is closest to how animals and humans learn.

Elements of reinforcement learning

a) Policy - defines the learning agents' behavior at any given time. Basically mapping States to action.
b) Reward Signal - the goal of reinforcement learning. The aim is to maximize it.
c) Value Function - while reward specifies what's good in the short run, value function specifies what's good in the long run. It is an accumulation. Rewards are primary whereas Value function is secondary.
d) Model - Optional. Mimics the environment. Used for planning. Methods with models are called model-based whereas the opposite is called model free.

Reinforcement versus Supervised and Unsupervised Learning

Supervised learning entails a  training set where we give it a set of problems and their corresponding solutions. Unsupervised learning entails finding a pattern when given data in coming up with a solution, it is given no training set. One might wonder as to the differences between Unsupervised learning and Reinforcement learning as both are not given solutions and are expected to figure out things as they go. Well, the key difference is that reinforcement seeks to maximize the reward whereas unsupervised learning is more to finding a pattern.

Reinforcement paradigm: the balance

That being said, reinforcement seeks to find a balance between exploitation and exploration. The agent exploits what it already knows to obtain a reward but it also has to explore to get better rewards in the future. Striking a balance in this regard is the paradigm that mainly challenges reinforcement learning in comparison to supervised and unsupervised learning.

Comments

Popular posts from this blog

Making money with the falling rand: Lessons from Zimbabwe

It is no secret that the rand is falling like there is no tomorrow. This year alone it has fallen by over 18%. And if you look closely, at the last 3 years- it has fallen by 35%! This is not neglecting the economic setup where the slightest thing leads to ‘ toi toi. ' This trend of continuous striking and pay rate increase bargains has created such a vicious cycle. Prices rise, people strike, economy starts going through stuff. And we back at square one. We all know for sure that this cycle is bad. Zimbabwe and South Africa might not be different soon, only difference being that Zimbabwe chased the farmers, South Africa is chasing stabilisation. (Maybe the paradox of thrift  (prompted by the large population) will save them! Hope so.) In Zimbabwe 2008, a lot of people made a lot of money from ‘burning money’. This was whereby people took advantage of the bank rate versus the ‘streets’ rate of forex. The streets rate for forex was lower than the bank rate. Problem wa...

Artificial Neural Networks - Intro for beginners

The perceptron Single node perceptron Perceptrons form the basis for ANNs. Perceptron takes input and produces output as below: Input ➡️ Activation Function ➡️ Output Input If the weight is 0- input remains unaltered coming into the perceptron. Below is what happens to the input ∑ w i z i  ≽ t then y = 1. Else y = 0.  i t is the threshold which is set by the outgoing part. So the key to output is based on the weighted sum and the threshold. Activation function This is the processing part of the neuron and this determines output. So most commonly the ones used are the Sigmoid function and the hyperbolic tangent. QUESTION: Which activation function should I use? I am going to talk of three key activation functions a) Sigmoid The Sigmoid returns 0 or 1 and in code can be written as return 1.0/(1.0+Math.exp(-x) A Sigmoid is a mathematical curve having the characteristic S shape. DISADVANTAGE: Descending Gradient b) Tanh DISAD...

Gentlest introduction : The Cloud

How it began To me the concept of cloud started when people began virtualizing their systems using the like of Virtual-box and VMware. What bough this a;long was the evolution/advent of technology which made it possible for software so simulate hardware. Originally software could not simulate hardware drivers but the moment that became possible- virtualization was born. A few years after- companies started offering this virtualization at a much grander scale and Infrastructure as a service was born. Lets get down to the three cloud components namely: Infrastructure as a Service (IAAS) This outsourced hardware meaning that one noways noes not need to setup servers, air-con and the like of access control but could 'rent' from someone and one of the great things about this was that a backup not only meant software but also meant hardware (as software could now simulate hardware) so recovery in case of disaster became easier. Platform as a Service (PAAS) This is ...