Typical trends in Histogram
The main purpose when we draw a histogram is to see if some trends are visible in the data or not and there are some standard trends that we should look out for in the histogram. These standard trends are discussed in this article. So, let’s get started.
Here is the list of standard trends:
- To look for how far are the values in the data spread out
- Is the data density high in certain intervals — if data is divided into class intervals, is it the case that there are few intervals that have very tall bars and the others have very short bars? If that’s the case that means most of the data is concentrated in those class intervals and can be used for further analysis
- Are there any gaps in the data? Are there certain regions where there are no data values available?
- Are there any outliers in the data and if yes, what are the outliers? So, we look for if there is any value(s) that is very very far off from the other values
Let’s answer each of these questions for the dataset that reflects the runs scored by Sachin Tendulkar in all the ODIs he played. So, here is the histogram depicting the same considering a bin size of 10:
- The data spread is from 0 to 200(range of x-axis values), which is clear from the histogram
- Most of the data is concentrated in the interval 0 to 40. These are the high-density intervals
- There are no data values in certain regions between 150 to 200 or few values in the certain regions between 150 to 200
- Then we also see that there is an outlier, so the value 200 is way far from off from where most of the data is concentrated.
Apart from this, there are also very some standard patterns, one of them is Left skewed histogram. Below are all the examples of left-skewed histograms:
Just as long distributions for frequency plots are common, these kinds of left-skewed histograms are also very common in many applications.
So, from the above plots, we can say that in a Left skewed histogram, most of the short bars are towards the left and a few tall bars towards the right.
The below histogram is for the Protein content of Liver Patients and this is an interesting plot, this is largely left-skewed because there are very very small bars towards the left, most of the tall bars are on the right, and there is also a right tail, there is a small set of bars appearing on the right also. So, this is like a left-skewed histogram with a heavy right tail also, it has these big bars towards the right that form the heavy right tail, so this is not exactly just a left-skewed histogram but a special case where we have a left skew and also a heavy tail, so a long tail on one side and a heavy tail on the other side
Similarly, we have a Right skewed histogram where most of the data is concentrated towards the left and we have this long tail of short bars towards the right.
Here are some examples of Right skewed histogram:
Then we also have a Uniform histogram, so say we are plotting the data where someone simulated an unbiased dice 100000 times and since this is an unbiased dice, we expect all the faces to show up more or less an equal number of times(in the below plot, we have gone way extreme and set all of the faces appeared an exactly equal number of times but what would happen in practice is that all of them would be within a very small range of each other and the distribution would look almost uniform that means all of the values would appear with the same or almost same frequency, that’s what a uniform distribution means)
Here is one example of a symmetric histogram:
In the above histogram, it's clear that data concentration is towards the middle and then two extremes bars on either side, so this is a perfectly symmetrical distribution and if we place the mirror at the tallest bar, then we have the exact replica on the other side.
The above histogram is almost symmetric so it has a tail on the left side, then a peak, and then an almost similar tail on the right side as well, it’s not perfectly symmetric but almost symmetric.
In this article, we discussed the typical trends in a histogram that we get in real-world data and these typical trends correspond to Left skewed, Right skewed, Uniform, Symmetric histogram.