Expectation gives a summary of the random variable, it tells that on average this the value that the random variable can take for example in geometric distribution, the random variable could have taken values from 1 to infinity but the expectation is around 11, it gives a good summary of the random variable but it does not tell about the spread of the data points.
And below are three scenarios where-in the first scenario the random variable can take only one value 0 and the probability of the random variable taking on that value is 1.
The second scenario/random variable can take on two values -1 and 1 and the probability of the random variable taking on these values would be 1/2 for each.
And the third random variable can take 4 values say -100, -50, 50, 100 each of them with equal probability which is 1/4
The expected value of all three random variables would be 0
If we go just by the expectation then these three random variables look very similar whereas if we look at the data values we see that the random variable Z(third scenario) has a very widespread as compared to the random variable Y and X.
Earlier, we have defined the variance as the measure of spread and the formula for computing the variance of a population is as below:
The variance of a random variable is given as:
Comparing the two formulas, X represents xᵢ and E(X) represents the mean or the center of gravity and then we are taking the expectation over that(outer E) which means taking a weighted sum of these quantities and that is the same as what we do in the case of a population(where the weights of the each of the quantity is the same whereas in case of a random variable we take the weights as proportional to probability).
Let’s expand the square term in the formula
Now the random variable can take on many values with certain probabilities but the expected value of the random variable — the center of gravity that’s a constant so the expected value of mu-square(last term in the above formula) would be a constant
(E[X])² is easy to compute once we have the expected value and E[X²] is also easy to compute as x² is a function of ‘x’ and we know how to compute the expected value of a function of the random variable.
Let’s take the example of rolling two dice and compute the variance of the random variable which maps the outcomes to the sum of values on two dice
As the expected value is the center of gravity, from the plot we can say that the value 7 is the expected value, and using this value in the formula, we have:
Let’s take one more example, say there are two mutual funds and their return rate is given to us along with their return in the last five years
If you have to choose between these two funds, which one would you choose?
Let’s say X is the random variable which denotes the return and based on the data, the random variable can take on the given five values and each of the five values are equally likely(so we have data for five years and out of that each of the given values accounts for one year so each of them has a probability of 1/5), so we can compute the expected value of both the funds as
The expected return for both the funds is the same, so which one to choose?
When looking at mutual funds, what matters is consistency and as a measure of consistency we compute the variance:
Similarly, we can compute the expected value of Y² and once we have these values, we can compute the variance
Obviously, the random variable Y has a much low variance as compared to X while having the same value of expectation so it makes sense to invest in the second mutual fund because it's more stable more consistent, you’re likely to get a stable return as compared to the other one where you’re not going to get a stable return you might get a high return or even negative return, the risk there is high and the reward is also high.
As the variance units are not the same as the units of the original quantity, the square root of variance gives the standard deviation which has the same units as the original quantity.
Properties of Variance
Here again, the first property is related to the random variables which have a linear relationship, let’s say we have a random variable ‘Y’ which is a linear function of the random variable ‘X’ and we are now interested in the variance of the random variable ‘Y’ in terms of the variance of ‘X’
Taking out the constant terms outside and expanding the formula, we get the following:
We have the same effect as for the variance of a population, if we transform each value then the variance is ‘a²’ into the original variance where ‘a’ is the multiplication constant(value by which values are scaled).
The other property is the sum of random variables
This property is true(variance of the sum of random variables is equal to the sum of the variance of the random variables) in a very specific case when the random variables are independent
Just like we have independent events, we can have independent random variables and two random variables are said to be independent if the probability of the random variable ‘X’ taking on a specific value say ‘x’ given that the random variable ‘Y’ takes on a value ‘y’ is the same as the probability of the random variable ‘X’ taking on the value ‘x’. And this should be true for all the values that X can take and all the values that Y can take
Let’s take an example where the two random variables are not independent so returning back to the two dice example
Let X be the random variable which indicates the number on the first dice and Y be the random variable which indicates the sum of the numbers of two dice
Here is a PMF of the random variable Y
Let’s take the case that the sum of numbers on two dice is 8, so we have
Let’s consider the probability of P(Y = 8) given X = 1 i.e the first dice shows up a 1.
As the first dice shows up 1, that means the maximum value of the sum would be 7(when the second dice shows up 6), it could never be greater than 7 and therefore the probability of P(Y = 8 | X = 1) would be 0.
And since the P(Y = 8 | X = 1) is not equal to the P(Y = 8), we can say that these two random variables are not independent.
In a similar way, we have the definition of ‘n’ independent random variables as follows:
So, ‘n’ random variables are said to be independent if the joint probability of these random variables taking on some values (X₁ taking on value as x₁, X₂ taking on value as x₂, and so on) is equal to the product of there individual probabilities.
Given ‘n’ such independent random variables, the following property holds:
The variance of the sum of these random variables is equal to the sum of the variance of the random variables when the random variables are independent.