Skip to main content

AI Simplified 2 : Bellman and Markov

Bellman equation 

It's named after Richard Bellman and is defined as a necessary condition for optimality. It is associated with the mathematical optimization method known as dynamic programming. The deterministic equation is listed below: 

V(s) = maxₐ  (R(s,a) + ɣ(V(s'))


maxₐ: represents all the possible actions that we can take.
R(s, a): The reward of taking an action at a particular state.
ɣ: Discount. Works like the time value of money.

You can see the dynamic programming aspect as we call the same method on s'. So it will recursively operate to solve the problem.

Deterministic vs non-deterministic: Deterministic is definite, there is no randomness whereas non-deterministic is stochastic.

So above we were not adding any randomness (deterministic) but nothing in this world is truly predictable, let's add randomness. Whereby each step is not so certain to be done (adding probability). It makes our agent more natural (being drunk! lol) so that means maybe our agent will have an 80% probability to listen 10% to go another way and 10% to go another way.

Markov Decision Process 

This is essentially the Bellman equation but catering for non-deterministic scenarios. We will loop through each possible next state, multiply the value of that state by its probability of occurring and sum them all up together.

The Markov Property

This property states that the past does not matter - it only looks to the future and the present. So it only looks to the current state. It also helps us save our memory since we do not need to save past states. 

Concept

A stochastic process has the Markov property if the conditional probability of the future states of the process depends only on the current state. The MDP provides a mathematical framework for modeling decision making where outcomes are partly random and partly under the control of the decision-maker. It adds more realism and perfection to the Bellman equation. It changes the Bellman from being deterministic to be non-deterministic.

The equation

S: This is the state our agent is in.
Model : T(s,a, s') ~ Pr(s'|s, a). Also known as the transition model. Determines the physics or rules of the environment and the function produces the probability.
Actions: A(s), A. Things you can do in a state.
Reward : R(s), R(s,a), R(s,a, s' ) Represents the reward of being in a state. The main focus is R(s). This is the reward function.
Policy: 𝝿(s)→a, 𝝿*. The solution to an MDP. It differs from the plan because a plan is just a sequence of events whereas the policy tells you what to do in any state. Shows the key-value pairs.

V(s) = maxₐ (R(s,a)+ ɣ∑s' P(s, a, s') V(s'))


Living Penalty- It's a negative reward for being in a particular state. The main incentive for this is for the agent to want to finish the game as quickly as possible.



Comments

Popular posts from this blog

Django & Firebase - A marriage of awesomeness

Requirements 1) Django (obviously) 2) Pyrebase (pip install pyrebase, you know the drill) So to give a better appreciation- I will first show you the HTML way then I'll proceed to show you how its done in Python. METHOD 1 : The HTML way Then you need to go to the firebase console. To setup a new project. After that you will see a web setup and you select that. Once selected you will see the config. It should be similar to this : Now that you have configured firebase, we now need to select the components which will be needed. depending on what you want for your app- you can now select the modules that you need. For us today, we will do the login authentication. Make sure you include firebase app first, so this is my screen: METHOD 2: Enter Python Open your dev environment and create a file named  pyrebase_settings within your django app folder. In it, you will have the following: Now, lets go to views.py!

PRG, PRF, PRP in Cryptography - What are they?

So I have been reading up on my cryptography and I figured I should give out a brief lesson on these three amazing concepts What are they ? a) PRG (Pseudo Random Generator) You probably know the difference between stream and block cipher. One of the main differences between them is key size. Stream ciphers require the key to be of equal length of greater than the plaintext ,   whereas Block Ciphers take a key smaller than the PT and is then expanded. This is the PRG The PRG expands the seed Considerations: Stream Ciphers base on Perfect Secrecy whereas Block Ciphers base on Semantic Security b) PRF (Pseudo Random Function) Lets share a secret- imagine something- you want to authenticate yourself with me by proving that you know a secret that we both share. Here's a possible option i) Possible Option 1:  PRNGs We both seed a PRNG with the shared secret, I pick and then send you some random number i.  You   then have to prove that you know the s...

My Arduino journey

So I was starting off the Arduino and I did not know a thing about it! All I knew is that you could be Iron Man from this tiny thing. My skill was coding and none of the hardware stuff so I knew I had to learn. This post serves to document my journey into this Arduino world. What I know so far... Arduino is open source, the hardware AND software(Cool huh?) Arduino needs an IDE  You can connect it to your pi STEP ONE: Download the IDE from  this link. Depending on the OS. I'm on Ubuntu 16.04. I downloaded- did the ./install.sh and voila! done. What the challenge now was to navigate the board properly but after a bit of reading I learnt that pin 13 has an inbuilt resistor  and putting in the LED was now easy as cathode gets into the GND and  our friendly pin 13 has our back! Now my challenge became the permission error then I realised how to set permissions with this Linux code sudo chmod a+rw /dev/ttyACM0 And you ready to go! Seconds later- I h...