Member-only story
Distribution of a categorical variable
So far we have looked at the distribution of data both with a distribution plot and a box plot — we are not looking at the numbers themselves but a frequency of numbers(how often a certain value is present) but we looked at continuous variables in both the plots.
If we look at the diamonds dataset(available with the ‘seaborn’ package), there are some categorical attributes as well, the below is a snapshot of this dataset
Attributes like ‘cut’, ‘color’, and ‘clarity’ seem to have a finite number of values. So, in this article, we discuss how to plot the distribution of a discrete or a categorical variable.
Bar plot is one of the ways to see the distribution of a categorical variable:
First, we group the dataset with respect to the attribute of interest say ‘cut’, and then apply the ‘.count()’ method on top of it
This command splits the data by different possible values of the ‘cut’ attribute and then for each unique value of ‘cut’, it counts the non-null values for each of the other attributes in the dataset