Random Variable
In the last few articles, we discussed the concept of experiments, outcomes, collection of all the outcomes which is the same as the sample space, events which are the subset of the sample space, then we discussed the axioms and properties of probability, Baye's theorem, total probability theorem and in all of these, we were mainly dealing with events and outcomes.
In this article, we discuss the numerical quantities associated with the outcomes of the experiment. Let’s understand what we mean by numerical quantities with the help of an example:
Say we roll 2 dice simultaneously and here is the what the sample space will look like:
We are interested in the numerical quantities(sum of numbers on two dice is reflected in the below image) associated with the outcomes
So, out of a total of 36 possible outcomes, we are interested in 11 numerical quantities associated with those 36 outcomes.
The first outcome is linked to the sum as 2, the next two outcomes are linked to the sum as 3, the next three outcomes are linked to the sum as 4, and so on
Every possible outcome is mapped to some numerical quantity and every outcome is mapped to exactly one quantity.
More than one outcome might map to the same numerical quantity.
In this case, the numerical quantity is a subset of integers but in general, we can say that we map the outcomes in the sample space to some real numbers/quantities.
Now that we have mapped the outcomes to some numerical quantity, then the question that arises is: “What is the probability that the sum will be any number say 5?”
We are interested in the probabilities of numerical quantities and not the outcomes and to get back the probabilities of the numerical quantities we might have to go back to the sample space that’s okay but the interest lies in the numerical quantities associated with the outcomes/sample space.
And this mapping is there in all domains for example in a class of students, we are in interested in real numbers associated with these students, so every student is sort of an outcome, in this case, say we are selecting a student and mapping the student’s identity to the CGPA records and the sample space would contain all the students in the class.
We are not interested in the probability of a student being Student X or Student Y, we are interested in some numerical quantities associated with these students. One such quantity is CGPA
Here as well every outcome is mapped to some real number and no outcome has been mapped to more than one number
And here again, we are interested in the probability of numerical quantities and not the outcomes.
The numerical quantity could be anything for example in addition to CGPA it could be the height, weight, age, and so on.
Let’s take one example:
Say the sample space is all the employees of an organization and the experiment is about randomly selecting an employee and we might be interested in some numerical quantities associated with the employee
For the same sample space, there could be multiple quantities of interest
Another example would be: say the sample space consists of all the farms in a state and here the question of interest would be the size of the farm, yield per acre, and so on
So, the same sample space is mapped to multiple real quantities and we are interested in the probability of these quantities.
In all the domains, we would be able to map the sample space to some numerical quantities/real numbers.
We can think of it as a function, this function takes in one element from the sample space and maps it to some value. This function is termed as a random variable(as the input to the function is random).
All the inputs that the function can take are termed as the domain of the function and the corresponding output is given by the function is termed as the range of the function.
There could be multiple random variables associated with the given domain for example if we take the domain as the employees then one random variable might give the salary of the employee, the other random variable might give the height of the employee, and so on. Similarly, if we take the domain as students, we have multiple random variables associated with it
Usually, in programming we define a function as follows:
function_name(parameter1 , parameter 2, …..)
that is we have a function name and the list of parameters that the function takes
whereas we defined random variable just by the name say X
Now that we are clear on what random variables are, the next question of interest is:
Random variables could take on discrete values or continuous values
A random variable is termed as a discrete random variable if it takes on discrete values(for example the number of cars in an image is going to be a discrete value say 5 it can not be 5.5).
A continuous random variable is the one which takes in continuous values for example the height of a student could be 145.5cms or 145.7cms and so on and not just 145cms or 146cms.
References: PadhAI