Alternative ways of computing the percentile value

In the previous article, we discussed what the term ‘percentile’ means and how to compute the pᵗʰ percentile in a given dataset. In this article, we discuss some alternative ways of computing the percentile values.

Here is the procedure for computing percentiles which we discussed in the previous article

Image for post
Image for post

Let’s look at the alternative way to compute the percentile value:

Image for post
Image for post

So, in this alternative approach, we first take into consideration ‘n’ instead of ‘(n+1)’ when computing the location of the percentile value and if the location turns out to be an integer, say 18 then the percentile value is given as the mean of the values at the location 18 and 19.

Image for post
Image for post

And if the location of the percentile is not an integer say it is 18.2, then we take the percentile value as the value at the next integer location(i.e at location 19).

Let’s take an example where we use the above-defined approach to compute the percentile value

Image for post
Image for post

The intuition behind the alternative approach:

pᵗʰ percentile in this approach is defined as

Image for post
Image for post
Image for post
Image for post

So, we have this additional part in the definition compared to the definition used in the previous approach wherein we say that the pᵗʰ percentile is that value such that (100-p) percentage of the values are greater than or equal to it.

Let’s understand the effect of this when computing the percentile:

Image for post
Image for post

So, the values at the location 18 satisfy the condition that at least 17.5 values are less than or equal to it whereas if we consider the location as 17, then it has only 17 data elements less than or equal to it but we need 17.5 values to be less than or equal to the percentile location/value.

And this is true if we consider location 18 or 19 or 20 or all the way up to 25.

Now if we consider another condition as well that is at least 7.5 values should be greater than or equal to the percentile value/location. So, if we look at location 18, we have 8 data elements, values in the dataset which are greater than or equal to the value at the percentile location(18).

If we look at the location 17, then there are 9 data elements which are greater than the value at location 17 but the location 17 does not satisfy the other condition where we say that it should have at least 17.5 data elements less than the value at location 17.

So, the first condition(at least 17.5 elements should be less than or equal to it) is satisfied by all the elements at the location 18 to 25 and the second condition is satisfied by all the values at the location less than or equal to 18 ie. all the locations from 1 to 18.

The location 18 is the only location that satisfies both the condition and hence we output the percentile as the value at location 18. And that’s what we do in this approach that is if the percentile location is 17.5, we take the integer part and add 1 to it and look at this new location.

Image for post
Image for post
Image for post
Image for post

Let’s look at the second case where the percentile location is an integer:

Image for post
Image for post

The percentile location comes up as 20, so for all the locations from 20 to 25, each one satisfies the first condition that at least 20 values are less than or equal to the value at that location.

If we look at the other part of the definition, we need at least 5 values to be greater than or equal to the value at location 20, and that condition is also satisfied by all the locations less than or equal to 21 i.e all the locations from 1 to 21.

From above, we see that the locations 20 and 21 satisfies both the conditions and that’s why we take the average of the values at these two locations.

Now let’s look at the next alternative way of computing the percentiles:

Image for post
Image for post

We see that the approach is almost the same as the first method, it’s just that when the percentile location is not an integer, then we consider the 0.5 instead of the fractional part when computing the percentile value.

So, 0.5 is used as an approximation. Let’s say the location of the percentile value is 18.2, then we know that it lies between the location 18 and 19 and we are not bothered about knowing if it’s closer to 18 or 19, we just take the 0.5 as an approximation of where it lies between 18 and 19 irrespective of what the actual fractional part is.

Here is a quick comparison of all the three approaches that we discussed and the values of the 70th percentile and 80th percentile for the three methods:

Image for post
Image for post

There is a slight difference in the final answer from the 3 methods and it could in certain practical applications make difference in terms of which candidates get short-listed, which items get selected, and so on.

References: PadhAI

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store