Member-only story

Distribution of a categorical variable

3 min readJan 10, 2021

So far we have looked at the distribution of data both with a distribution plot and a box plot — we are not looking at the numbers themselves but a frequency of numbers(how often a certain value is present) but we looked at continuous variables in both the plots.

If we look at the diamonds dataset(available with the ‘seaborn’ package), there are some categorical attributes as well, the below is a snapshot of this dataset

Attributes like ‘cut’, ‘color’, and ‘clarity’ seem to have a finite number of values. So, in this article, we discuss how to plot the distribution of a discrete or a categorical variable.

Bar plot is one of the ways to see the distribution of a categorical variable:

First, we group the dataset with respect to the attribute of interest say ‘cut’, and then apply the ‘.count()’ method on top of it

This command splits the data by different possible values of the ‘cut’ attribute and then for each unique value of ‘cut’, it counts the non-null values for each of the other attributes in the dataset

Distribution of a categorical variable

Written by Parveen Khurana

No responses yet