Skip to main content

AI Simplified 2 : Bellman and Markov

Bellman equation 

It's named after Richard Bellman and is defined as a necessary condition for optimality. It is associated with the mathematical optimization method known as dynamic programming. The deterministic equation is listed below: 

V(s) = maxₐ  (R(s,a) + ɣ(V(s'))


maxₐ: represents all the possible actions that we can take.
R(s, a): The reward of taking an action at a particular state.
ɣ: Discount. Works like the time value of money.

You can see the dynamic programming aspect as we call the same method on s'. So it will recursively operate to solve the problem.

Deterministic vs non-deterministic: Deterministic is definite, there is no randomness whereas non-deterministic is stochastic.

So above we were not adding any randomness (deterministic) but nothing in this world is truly predictable, let's add randomness. Whereby each step is not so certain to be done (adding probability). It makes our agent more natural (being drunk! lol) so that means maybe our agent will have an 80% probability to listen 10% to go another way and 10% to go another way.

Markov Decision Process 

This is essentially the Bellman equation but catering for non-deterministic scenarios. We will loop through each possible next state, multiply the value of that state by its probability of occurring and sum them all up together.

The Markov Property

This property states that the past does not matter - it only looks to the future and the present. So it only looks to the current state. It also helps us save our memory since we do not need to save past states. 

Concept

A stochastic process has the Markov property if the conditional probability of the future states of the process depends only on the current state. The MDP provides a mathematical framework for modeling decision making where outcomes are partly random and partly under the control of the decision-maker. It adds more realism and perfection to the Bellman equation. It changes the Bellman from being deterministic to be non-deterministic.

The equation

S: This is the state our agent is in.
Model : T(s,a, s') ~ Pr(s'|s, a). Also known as the transition model. Determines the physics or rules of the environment and the function produces the probability.
Actions: A(s), A. Things you can do in a state.
Reward : R(s), R(s,a), R(s,a, s' ) Represents the reward of being in a state. The main focus is R(s). This is the reward function.
Policy: 𝝿(s)→a, 𝝿*. The solution to an MDP. It differs from the plan because a plan is just a sequence of events whereas the policy tells you what to do in any state. Shows the key-value pairs.

V(s) = maxₐ (R(s,a)+ ɣ∑s' P(s, a, s') V(s'))


Living Penalty- It's a negative reward for being in a particular state. The main incentive for this is for the agent to want to finish the game as quickly as possible.



Comments

Popular posts from this blog

Django & Firebase - A marriage of awesomeness

Requirements 1) Django (obviously) 2) Pyrebase (pip install pyrebase, you know the drill) So to give a better appreciation- I will first show you the HTML way then I'll proceed to show you how its done in Python. METHOD 1 : The HTML way Then you need to go to the firebase console. To setup a new project. After that you will see a web setup and you select that. Once selected you will see the config. It should be similar to this : Now that you have configured firebase, we now need to select the components which will be needed. depending on what you want for your app- you can now select the modules that you need. For us today, we will do the login authentication. Make sure you include firebase app first, so this is my screen: METHOD 2: Enter Python Open your dev environment and create a file named  pyrebase_settings within your django app folder. In it, you will have the following: Now, lets go to views.py!

PRG, PRF, PRP in Cryptography - What are they?

So I have been reading up on my cryptography and I figured I should give out a brief lesson on these three amazing concepts What are they ? a) PRG (Pseudo Random Generator) You probably know the difference between stream and block cipher. One of the main differences between them is key size. Stream ciphers require the key to be of equal length of greater than the plaintext ,   whereas Block Ciphers take a key smaller than the PT and is then expanded. This is the PRG The PRG expands the seed Considerations: Stream Ciphers base on Perfect Secrecy whereas Block Ciphers base on Semantic Security b) PRF (Pseudo Random Function) Lets share a secret- imagine something- you want to authenticate yourself with me by proving that you know a secret that we both share. Here's a possible option i) Possible Option 1:  PRNGs We both seed a PRNG with the shared secret, I pick and then send you some random number i.  You   then have to prove that you know the s...

Deploy Django app online for free!

So after a number of lines of code, brilliance and dreaming. Your next dream is for the world to see. Of course you can walk around with your computer and doing a 'manage.py runserver' But cumon guys, lets embrace the cloud. Not like this guy though! I choose to deploy on  PythonAnywhere . So you ask why? 1) Free amazing support - You actually talk to a live human ! 2) Easy - Very easy 3) Affordable - As you scale up, it gets way better! So by now I assume you are already on a version control system (So I will not waste much energy on that one). Maybe Ill someday write on my two favs  Github  and  Bitbucket . STEP 1: Create an account on pythonanywhere. Kindly note that your username will be included in your apps url. So it will be like : " yourUsername .pythonanywhere.com" STEP 2: Select other and set a bash console. STEP 3: Push your code from version control This will push from (in my example) github to your pythonanywhere. You...