In the last article, we discussed the total probability theorem and in this article, we discuss the Baye’s theorem.

Coming back to the monster’s example discussed in the last article, say we are given that the person does not come out alive that means the event B has occurred(events definition are given in the below image and it is assumed that if the monster is encountered than the person will not come out alive)

And we are given the following data:

We are interested in the question:

To begin with, the probability of taking the path A₁ is 1/3. Now that the event B has occurred, we are interested in the probability of A₁, so essentially we are interested in the conditional probability:

P(A₁ | B)

P(B) is not given in the problem statement but we can compute it using the total probability theorem

Initially, the probability of choosing path A₁ was 1/3 but now that it is given that the person has not come alive, the probability that the person has taken the path A₁ has gone down to 0.182, and it makes intuitive sense as well. Let’s understand why did this happen?

There was only a 30% chance of encountering a monster on the path A₁ whereas the probability of encountering a monster on the other paths was much higher, so the other two paths were much riskier than the path A₁. If we know that the person did not come out alive, we would have more confidence that the person might have taken the riskier path and hence the probability of the other two paths would increase and the probability of taking the path A₁ would decrease.

If we use the same approach to compute the probability that the person has taken the path A₃ then that will be greater than 1/3 which indeed is the case as reflected in the below image

Multiple things to compute the answer, the multiplication rule is leveraged, the total probability theorem is also used and the terms in the standard formulas are re-arranged to compute the required probability value

In this example, since B is the event that the person encountered a monster and once we have the information that the event B has occurred, then we update our beliefs about the causes(i.e path taken, the final outcome i.e B could have been because of A₁ or A₂ or A₃) so that’s how Bayes theorem works.

Let’s take one more example:

Say there are two ships that are trying to communicate with each other and Ship 1 sends a specific signal say Signal 1 and say this a danger signal and it sends this signal only with a probability of 1% and the signal is transmitting over some channel and there is a long distance between the two ships so there might be some error in the transmission of the signals

Let A be the event that the Ship 1 sends out signal 1 and B be the event that the Ship 2 receives signal 1

Similarly, if Ship 1 has sent Signal 0 then there is a 5% chance that the ship 2 might receive it as Signal 1

Essentially there is a 5% transmission error, 5% of the times signal 1 gets identified as signal 0 and signal 0 gets identified as signal 1.

Now supposed ship 2 receives a signal 1, then we are interested in knowing the probability that the ship 1 was actually in danger/sent out the signal 1

We can compute the required probability using Baye’s theorem, we can think of B as the effect that is something which is being received and A is the cause. We have seen the effect(B has occurred) and we are trying to reason about the cause.

The communication is pretty accurate, 95% of the times B receives the correct signal as transmitted by A and we have P(A | B) as 0.18 meaning when we know that the ship 2 has received the signal 1, we are just 18% confident that the Ship 1 actually sent the signal 1, let’s understand it in more detail:

Suppose ship 1 sends some 10000 signals(square in the below image represents this sample space) of which 100 corresponds to the Signal 1/danger signal(probability of A is given as 0.01). Because of the transmission error, only 95 of the 100 signals(corresponding to 100 signals of type danger/signal 1) are received as positive(signal 1) as there is 95% accuracy but 495 of the Signal 0 from Ship 1 would also be received as positive(danger signal/signal 1) as there is 5% transmission error

So, the yellow bounded rectangle in the above image is the entire sample space of positive signals as received at Ship 2 of which 95 actually corresponds to a positive/danger signal or signal 1 whereas 495 were false alarms meaning the signal was signal 0 but it was received as signal 1.

So, if we just look at the sample space of positive signals(as this the event that has happened P(A | B) meaning the signal 1 was received at Ship B, only 95 of them are actually the danger signal and the probability of a positive signal in this sample space would be:

( 95 / (495 + 95) )

and this number would indeed be 18%.

The way to interpret this number is that if signal 1 was not received by ship B then there is a 1% chance of Ship 1 sending a danger signal as P(A) is 0.01. Considering that case that the ship 2 received signal 1 (P(A | B) meaning event B has occurred) we have updated this number to 18%.

Baye’s theorem is about updating the existing beliefs, and that is indeed happening in this case as the probability of 1%(P(A)) updated to 18% and that’s a significant update.

Let’s take one more example:

Returning back to the COVID example(discussed in this article), given that the test result has come out to be positive we are interested in knowing the probability that the person is indeed infected:

Here we have been given P(B | A) as 0.99 meaning the test is very accurate as in 99% of the cases it gives the correct result if a person is actually infected and in 95% of the cases it gives the correct result if a person is not infected.

If we substitute the values as in the above image, we get the answer as 0.6875 so there is a 68% chance that the person is actually infected if the test results come out as positive even though the test seems to be 95–99% accurate as discussed above and this again can be explained if we reason out with the help of numbers.

Suppose we conduct 10000 tests(sample space), of these only 10% i.e 1000 are actually infected(P(A) = 0.1) and the remaining 9000 people we are testing are not infected.

Now there is a 5% error in the test when the person is not infected and 1% error in the test if the person is infected, what is happening is that in 5% of the cases when the person is not infected(5% of 9000 i.e 450) we get the test result as positive and when the person is infected, in 99% of the cases we get a positive result(99% of 1000).

So, if we look at the sample space of all the positive results then 990 correspond to the event that the person is actually infected and the total number of outcomes is (450(person not infected) + 990(person infected)) and the probability would be:

(990) / (450 + 990)

which is close to 68%

The way we interpret this number is that prior to doing the test we believed that there is only a 10% chance of a person getting infected but now after the test result has come out to be positive, that chance has significantly increased to ~68%.

The space corresponding to A complement(corresponds to a person not infected) is so much bigger than A, to begin with, 90% of the population is not infected, so now if we are getting the output that there is a 68% chance that someone is infected, that probability conveys a significant improvement as earlier we had a 10% chance that someone is infected but after seeing the evidence, that chance has increased to 68%.