In this article, we discuss the Bernoulli distribution which can be compactly specified by a few parameters and it is related to experiments with only two possible outcomes.
There are many such experiments in all domains where the output would have only two possible values. For example, the result of a blood test for a particular disease area could be positive or negative, if someone writes an exam the outcome could be pass or fail, a new movie in the industry could be a hit or a flop, and so on.
In general, we could think of the experiments where one of the outcomes is a failure and the other is a success(it varies from domain to domain and task to task as in what we call as a failure and what we call as a success but in general we have a possibility of two outcomes which would generally be a complement of each other) and such experiments are actually known as ‘Bernoulli’s trials’
And in such trials/experiments, it makes sense to define a random variable that takes in the event outcomes as the input and maps them to 0 or 1.
Such random variable is known as Bernoulli random variable and we are interested in the distribution of this random variable meaning the distribution of the probabilities to both the possible values that the random variable can take.
We can define an event A which corresponds to the case that the outcome is success and the probability of this event is say ‘p’
Now we define our function as Pₓ(x) which can take in the input ‘x’ (where x could be 0 or 1 as these are the two possible output values that a Bernoulli random variable can take) and this function should be able to give us a probability value in the range of 0 to 1.
In short, this function tells us the probability of the random variable taking on the value ‘x’(x could be 0 or 1 in case of Bernoulli trial)
Now Pₓ(1) is the probability of success(the probability that the random variable takes on the value as 1) and as discussed above, we are calling the probability of success as ‘p’ and given the probability of success we can intuitively define the probability of failure since we just have two possible outcomes, we can define Pₓ(0) as ‘1-p’
And we can write the same thing in a compact function:
So, the above equation is a compact of writing Bernoulli’s distribution.
We can also confirm that this distribution/equation follows the properties of a PMF:
As the value ‘p’ which is the probability of success would always greater than equal to 0 and less than equal to 1 which any probability value should satisfy based on the axiom of probability, we can be sure that pₓ(x) would satisfy the first property.
The second property is that if we sum over the probability of all the values that the random variable can take, we must get the sum as 1:
We just have two possible values of output and we can sum up the probability values for these
So, the mathematical equation for a Bernoulli distribution satisfies the properties of a PMF.