Skip to main content

AI Simplified 2 : Bellman and Markov

Bellman equation 

It's named after Richard Bellman and is defined as a necessary condition for optimality. It is associated with the mathematical optimization method known as dynamic programming. The deterministic equation is listed below: 

V(s) = maxₐ  (R(s,a) + ɣ(V(s'))


maxₐ: represents all the possible actions that we can take.
R(s, a): The reward of taking an action at a particular state.
ɣ: Discount. Works like the time value of money.

You can see the dynamic programming aspect as we call the same method on s'. So it will recursively operate to solve the problem.

Deterministic vs non-deterministic: Deterministic is definite, there is no randomness whereas non-deterministic is stochastic.

So above we were not adding any randomness (deterministic) but nothing in this world is truly predictable, let's add randomness. Whereby each step is not so certain to be done (adding probability). It makes our agent more natural (being drunk! lol) so that means maybe our agent will have an 80% probability to listen 10% to go another way and 10% to go another way.

Markov Decision Process 

This is essentially the Bellman equation but catering for non-deterministic scenarios. We will loop through each possible next state, multiply the value of that state by its probability of occurring and sum them all up together.

The Markov Property

This property states that the past does not matter - it only looks to the future and the present. So it only looks to the current state. It also helps us save our memory since we do not need to save past states. 

Concept

A stochastic process has the Markov property if the conditional probability of the future states of the process depends only on the current state. The MDP provides a mathematical framework for modeling decision making where outcomes are partly random and partly under the control of the decision-maker. It adds more realism and perfection to the Bellman equation. It changes the Bellman from being deterministic to be non-deterministic.

The equation

S: This is the state our agent is in.
Model : T(s,a, s') ~ Pr(s'|s, a). Also known as the transition model. Determines the physics or rules of the environment and the function produces the probability.
Actions: A(s), A. Things you can do in a state.
Reward : R(s), R(s,a), R(s,a, s' ) Represents the reward of being in a state. The main focus is R(s). This is the reward function.
Policy: 𝝿(s)→a, 𝝿*. The solution to an MDP. It differs from the plan because a plan is just a sequence of events whereas the policy tells you what to do in any state. Shows the key-value pairs.

V(s) = maxₐ (R(s,a)+ ɣ∑s' P(s, a, s') V(s'))


Living Penalty- It's a negative reward for being in a particular state. The main incentive for this is for the agent to want to finish the game as quickly as possible.



Comments

Popular posts from this blog

Django & Firebase - A marriage of awesomeness

Requirements 1) Django (obviously) 2) Pyrebase (pip install pyrebase, you know the drill) So to give a better appreciation- I will first show you the HTML way then I'll proceed to show you how its done in Python. METHOD 1 : The HTML way Then you need to go to the firebase console. To setup a new project. After that you will see a web setup and you select that. Once selected you will see the config. It should be similar to this : Now that you have configured firebase, we now need to select the components which will be needed. depending on what you want for your app- you can now select the modules that you need. For us today, we will do the login authentication. Make sure you include firebase app first, so this is my screen: METHOD 2: Enter Python Open your dev environment and create a file named  pyrebase_settings within your django app folder. In it, you will have the following: Now, lets go to views.py!

ANDROID: support libraries must use the exact same version specification

Here's a way you can tackle this annoying issue. Click on the Gradle side pane, select Tasks->android->androidDependencies. Running it will generate a list of the dependencies and the versions and a quick scroll should help you! Pic below should help

Making money with the falling rand: Lessons from Zimbabwe

It is no secret that the rand is falling like there is no tomorrow. This year alone it has fallen by over 18%. And if you look closely, at the last 3 years- it has fallen by 35%! This is not neglecting the economic setup where the slightest thing leads to ‘ toi toi. ' This trend of continuous striking and pay rate increase bargains has created such a vicious cycle. Prices rise, people strike, economy starts going through stuff. And we back at square one. We all know for sure that this cycle is bad. Zimbabwe and South Africa might not be different soon, only difference being that Zimbabwe chased the farmers, South Africa is chasing stabilisation. (Maybe the paradox of thrift  (prompted by the large population) will save them! Hope so.) In Zimbabwe 2008, a lot of people made a lot of money from ‘burning money’. This was whereby people took advantage of the bank rate versus the ‘streets’ rate of forex. The streets rate for forex was lower than the bank rate. Problem wa...