Probability Mass Function

Parveen Khurana
5 min readOct 9, 2020

--

In the last article, we discussed the concept of a random variable. In this article, we discuss the probability mass function which essentially answers the question around the probabilities of the values that the random variable can take.

Let’s take the example of rolling two dice and X is the random variable which maps the outcome to real numbers(which in this case is the sum of the numbers on two dice)

And the question of interest is the probability that the value given/mapped by a random variable is x(where x belongs to 2 to 12 in this case)

And this probability value would lie between 0 to 1

So, distribution is sort of a table/mapping reflecting the probability of all possible values that the random variable can take. In one column of the table, we have all the values that the random variable can take and in the other column, we have the probability corresponding to these values.

For the case of rolling a dice, we have the following distribution(assuming that the dice is a fair dice and all outcomes are equally likely):

Similarly, we can make up this distribution table for the case when we are rolling two dice simultaneously

Here the random variable maps the output to the sum of the numbers on two dice and we can note down the outcomes which give a particular output for example if the random variable maps to the value 2, then we know that only one outcome would correspond to that case i.e {1, 1}

Similarly, we can note this down for all outcomes for the events and now that we linked the output value to the events or more specifically the outcomes in the event, we can get the probability value using the same for example if we take the case that the random variable gives the output as 2, then we can convert it to an event which contains all the possible outcomes for which the sum of two digits is 2 and we just have one such possible outcome i.e {1, 1} from a total of 36 possible outcomes, so the probability that the random variable gives the output as 2 is 1/36.

Instead of the tabular display, we could have a graphical display of the same. On the x-axis, we have the values that the random variable can take, and on the y-axis, we have the corresponding probabilities value for example from the below plot, we can say that the probability that the random variable takes on the value 6(x-axis value) would be 5/36(y-axis value)

So, we have our function as pₓ(x) where ‘x’ in the subscript refers to the name of the random variable and ‘x’ in the parentheses corresponds to the values that the random variable can take and this function would give back the probability of the random variable taking on that value and that in turn will be equal to the probability of the event that the random variable takes on the value ‘x’ and this event corresponds to all the outcomes in the sample space such that when we apply the random variable to the outcome, we get the output value as ‘x’(the collection of all outcomes for which the random variable gives the output as ‘x’)

And this function is called as the probability distribution of the random variable, also called as the probability mass function or simply the distribution.

Properties of Probability Mass Function(PMF)

In the other article, we discussed the axioms of probability. Here we see the discuss those axioms for a probability mass function

  1. The output of the PMF must be greater than or equal to 0 and this is in line with the first axiom of probability. As PMF also relates to the probability of a particular event, and the probabilities of events satisfy the axioms of probability, this condition is trivially satisfied.

2. If we sum over the probabilities of all the values that the random variable can take, then that sum should be 1.

Rₓ is the subset of real numbers and corresponds to the values that the random variable can take(also called the support of the random variable).

In the case of a 2 dice example, Rₓ was just from 2 to 12(sum of numbers on two dice) and this subset is called the support of the random variable. So, support of a random variable is simply the set of values that the random variable can take

Let’s see this point for the case of rolling 2 dice

As the RHS forms a union of the disjoint events which partitions the sample space, we can say that the sum of their probabilities would be the same as the probability of the sample space which is 1 so the second axiom is satisfied. The third axiom is also satisfied as in this case we computed the probability of a larger event(sample space) from smaller events.

References: PadhAI

--

--

Parveen Khurana
Parveen Khurana

Written by Parveen Khurana

Writing on Data Science, Philosophy, Emotional Health | Grateful for the little moments and every reader | Nature lover at heart | Follow for reflective musings

No responses yet