# Binomial Distribution

In the last article, we discussed that a Bernoulli trial has only two possible outcomes. In this article, we use the concept of the Bernoulli trial and built on top of it to understand binomial distribution.

If we repeat the Bernoulli trial ‘n’ times(one example would be to toss a coin ‘n’ times) and conditions assumed are as follows:

1. These ‘n’ trials are independent which means that the success/failure of one trial does not affect the result of another trial.
2. These ‘n’ trials are identical meaning the probability of success in each trial is the same

And in such cases, we are interested in the below question:

Before we answer this question, let’s see some examples where this concept is used:

This is important because when packing ball bearings in a box say ’n’ in a box, we want to be very sure that there is a very small probability that ‘k’ of them will be defective or a number much greater than ‘k’ will be non-defective and this in an instance of running a Bernoulli trial ’n’ times.

Another example would say an e-commerce platform says that the probability that a customer purchases from their platform is ‘p

And the answer to the above question might be related to expected profits for the current quarter, target the specific customer base, and so on.

Similarly, we have in the marketing domain:

And this helps to decide how many emails to send.

Now that we have got the concept, let’s formally define the random variable for the binomial distributions

Sample space: we are doing ’n’ trials and for each trial, there are 2 possible outcomes — success or failure for example there are two possible scenarios being reflected in the below image, for the first one the random variable maps it to the number 3 as there are three success in it, it maps the second case it maps to the number 1 as there is one success in it

There are many such outcomes possible and for each outcome, the random variable maps it to some real number and the max. value(that the random variable could take) would be ‘n’(for ’n’ trials) when all the trials result in success and the minimum value would be 0 if all trials result in the output as a failure. So, the random variable can take on values from 0 to ‘n’

We are interested in the function pₓ(x) which gives the distribution of the random variable which takes in a value and gives the probability and further it is desired that this function is defined by a few parameters and it should then tell the probability of ‘x’ success in ’n’ trials.

Let’s see some examples basis which we can derive the general formula for the probability function:

So far we know how to compute the probability of events and how to link back the values that the random variable can take to the events.

In this case, the random variable can take on values from 0 to ‘n’ but still we don’t know what is the sample space, so let’s first see what the sample space looks like and we can then list down events in the sample space and use that to derive the general formula for the probability mass function.

The first question to get the idea of sample space would be:

So, we have two possible outcomes (success, failure) for each trial, we can think of it as making a sequence of length ’n’ from a given set of 2 objects when the repetition is allowed

So, there are 2^n such possible outcomes. If we take the example of tossing 3 coins, then as per the formula there are possible outcomes and that is indeed the case and the below image depicts the 8 possible outcomes(note that the terms heads and tails are used instead of success and failure)

And the random variable, in this case, maps to the number of heads we might get(let’s say we define success as getting heads), and the range of the random variable would be from 0 to 3 as at most we might get 3 heads.

Now each of the outcomes can be mapped to one such value of the random variable

We are interested in the case when n = 3 and k = 1 which means exactly one success in 3 trials. So, the event corresponding to k=1 would contain all the outcomes where we have exactly 1 head

And the probability of the random variable taking on the value 1 would just be the probability of the event A

We see that this event is the collection of 3 disjoint outcomes and the probability of the event would be the sum of the probabilities of these 3 outcomes

Let’s take anyone term from this say {HTT} — this outcome means we get a success in the first trial, failure in the second and third trial and since the probability of success is ‘p’ and since these 3 events/toss are independent we can write the probability of getting this outcome as the following :

Similarly, for the other event we can get the probability:

Note that the probability of each of the 3 possible outcomes(where we get exactly 1 head) is the same and that should be the case ideally as we just have one success term in all three and the other two failure terms

We can re-write the above term as the following:

Let’s see why we have the number 3 in the answer(there are three terms which we are summing up but we are more interested in why we got only three terms/outcomes for this event):

We have 3 trials and we are going to find out the size of the event A(number of outcomes in set/event A) and event corresponds to all outcomes with 1 success, so we are essentially looking for having 1 success out of 3 trials and the number of ways of doing that would be ³C₁, it would either be the first trial or the second trial or the third trial

Let’s take the case when n = 3 and k = 2 so we are looking at two successes in the 3 trials and the below outcomes would belong to this event of k=2

In this case, the number of ways of choosing 2 successes from 3 trials would be given by ³C₂, and as there are two successes and one failure, we’ll have two terms for ‘p’ and one term with probability ‘1-p’ and the probability of this event would be given as:

The following are the observations we make from the above two examples:

The general formula for computing the probability of any value that the random variable can take would be given by:

Now that we have the formula in place, we can compute the probability of the random variable taking on any possible value, this entire distribution is fully specified by 2 parameters: p, n

‘k’ is the input to the function

Now that we know what binomial distribution is, let’s look at some examples of the binomial distribution.

Example 1: Social Distancing

We are assuming that if someone gets in close proximity of a COVID-19 infected patient then that person will get infected.

Let’s see how this is related to the binomial distribution. The first thing we need to understand is what is the trial in this case:

The experiment here is that you come in close proximity of a person and every time this experiment is repeated there are two possible outcomes either you’ll get infected or you’ll not get infected based on if the other person is infected or not.

As there are two possible outcomes this means it’s a Bernoulli’s trial and we are repeating this experiment 50 times(given in the problem statement) and ‘p’ in this case would be 0.1 as there is a 10% chance that the person will get infected

So, we have converted this into a task where applying a binomial random variable makes sense, we are interested in the probability of getting infected and that will happen if at least 1 of the 50 people the person gets in touch with is infected

And we discussed in an earlier article that whenever there is some constraint, we refer to the subtraction principle, so leveraging the subtraction principle we convert the required probability as below:

And this is easy to compute given that we have the value of n as 50, p as 0.1, and k as 0, so we pass k as 0 to the function of the binomial distribution and we get the required answer

Let’s add some variation to this problem:

Let’s change the value of ‘p’

Let’s take another example:

There are 25 students in the class, we can think of the trial as selecting a student and we can repeat this trial 25 times, and let’s call success as the case when the student is a Linux user so the probability of success is 10%.

For the first part, the questions ask that exactly 3 students are using Linux which means we have exactly 3 successes in the 25 trials.

For the second part, the value of k lies in the range of 2 to 6 both inclusive and all of these would be disjoint events.

For the third part, we are changing ‘p’ to 0.9

We can plug in these values into the standard formula and get the answers but let’s try getting the answers using a plot of the data, the code block below generates a bar graph indicating the probability value for different values of ‘k

We use the ‘stats’ package available in ‘scipy’ and in particular, we are leveraging the ‘binom’ functionality and we pass in the required values of ’n’ and ‘p

As the random variable can take on values from 0 to 25, we set ‘x’ as the range of values from 0 to 25(exclusive) and this ‘x’ value would correspond to ‘k’ in the function for the binomial distribution

We can draw interesting insights from this plot for example for the value of ‘k’ beyond 9, the probability values are very very small and that’s why it’s not showing up on the plot for example for ‘k’ as 9, we will get the value as per the equation in the below image

Now that we have the plot, we can answer the questions being asked:

For the first part, we can note the probability value from the plot for ‘k’ as 3 and we get the answer as ~0.225

For the second part, we can add the respective probability values for ‘k’ as 2 to 6

We also note that the peak probability value is for ‘k’ as 2 and intuitively that makes sense as there are a total of 25 students and if we know that 10% of the students are Linux users, then our expectation would be that 2.5 people that we choose will be Linux users and that’s why we have high probability values for ‘k’ as 2 and 3.

Also, in this plot, all the tall bars are towards the left and this is because the probability of success is 0.1 and we expect a lower number of successes.

In the third part, we are tweaking the value of ‘p’ to be 0.9 so the probability of success is high and now the probability of having a large number of successes would be high and the probability of low successes would be very low for example if we know that out of 25 students, 90% of them are Linux users, so we expect 22–23 students to be using Linux and probability of just 1 student using Linux would be very low.

Now we have the tallest bars towards the right and that makes sense because getting a few successes is unlikely.

If we plot it for ‘p’ as 0.5, this is what we get the below plot:

And now the tallest bars are for ‘k’ as 12, 13. In this case, as the probability of success is 0.5, we don’t expect too many failures or too many successes. And we get symmetric distribution in this case as per the function equation of the binomial distribution.

## Is Binomial distribution a valid distribution?

We need to ensure a couple of properties for the Binomial distribution essentially the probability of a PMF.

The first property is that the probability of any value that the random variable can take is going to be greater than or equal to 0

We can prove this if we look at the function’s equation:

All the terms in the above formula are going to be a positive number and the product of three positive numbers is going to be a positive number.

The second property is that if we sum the probability of all the that the random variable can take, then that sum must be 1

We write down all the terms in the summation and then use the formula for all the cases/terms and we compare it with the standard formula of (a + b)ⁿ, we re-write the expression and we get the required proof:

So, the binomial distribution also follows the properties of a standard PMF.

Bernoulli’s distribution is a special case of binomial distribution where n equals 1(1 trial) and ‘k’ can only take the value as 0 or 1 and we can compute the probability of both the outcomes(0, 1) which comes out as (1-p) and ‘p