Shingirayi Mandebvu's Blog

Posts

Showing posts from April, 2020

AI Simplified 3 : Q Learning - From State to Action

Q Learning Previously we were looking at the value of the state. Q Learning now moves to calculate the value of an action. We now move based on our actions as opposed to the value of a state. People tend to think that its called q as shorthand for quality learning. Now let's move to determine the equation for q-learning. Deriving the equation Remember the stochastic Markov equation? V(s) = maxₐ (R(s,a)+ ɣ∑s' P(s, a, s') V(s')) Value of an Action equals Value of a state. ie V(s) = Q(s,a) ↡ Q(s,a) = R(s,a)+ ɣ∑s' P(s, a, s') V(s') No max as we are not considering all of the alternative actions but just one action. We need to wean ourselves from V so we need to replace V(s'). V(s') represents all possible states. It is also worthy to note that V(s') is also max(Q(s', a')). With that it becomes ↡ Q(s,a) = R(s,a)+ ɣ∑s' P(s, a, s') maxₐ (Q(s', a')) Why max? Well, we still want to get all the po

AI Simplified 2 : Bellman and Markov

Bellman equation It's named after Richard Bellman and is defined as a necessary condition for optimality. It is associated with the mathematical optimization method known as dynamic programming. The deterministic equation is listed below: V(s) = maxₐ (R(s,a) + ɣ(V(s')) maxₐ : represents all the possible actions that we can take. R(s, a): The reward of taking an action at a particular state. ɣ: Discount. Works like the time value of money. You can see the dynamic programming aspect as we call the same method on s'. So it will recursively operate to solve the problem. Deterministic vs non-deterministic : Deterministic is definite, there is no randomness whereas non-deterministic is stochastic. So above we were not adding any randomness (deterministic) but nothing in this world is truly predictable, let's add randomness. Whereby each step is not so certain to be done (adding probability). It makes our agent more natural (being drunk! lol) so

AI Simplified 1 : Reinforcement Learning

What is reinforcement learning? Basically, it is learning from interaction and giving our agent a reward for achieving a goal. This field is essentially a class of problems with a class of solutions and a study of these classes. Of all the forms of Machine Learning, reinforcement learning is closest to how animals and humans learn. Elements of reinforcement learning a) Policy - defines the learning agents' behavior at any given time. Basically mapping States to action. b) Reward Signal - the goal of reinforcement learning. The aim is to maximize it. c) Value Function - while reward specifies what's good in the short run, value function specifies what's good in the long run. It is an accumulation. Rewards are primary whereas Value function is secondary. d) Model - Optional. Mimics the environment. Used for planning. Methods with models are called model-based whereas the opposite is called model free. Reinforcement versus Supervised and Unsupervised Learnin