Member-only story

Distribution of a categorical variable

Parveen Khurana
3 min readJan 10, 2021

--

So far we have looked at the distribution of data both with a distribution plot and a box plot — we are not looking at the numbers themselves but a frequency of numbers(how often a certain value is present) but we looked at continuous variables in both the plots.

If we look at the diamonds dataset(available with the ‘seaborn’ package), there are some categorical attributes as well, the below is a snapshot of this dataset

Attributes like ‘cut’, ‘color’, and ‘clarity’ seem to have a finite number of values. So, in this article, we discuss how to plot the distribution of a discrete or a categorical variable.

Bar plot is one of the ways to see the distribution of a categorical variable:

First, we group the dataset with respect to the attribute of interest say ‘cut’, and then apply the ‘.count()’ method on top of it

This command splits the data by different possible values of the ‘cut’ attribute and then for each unique value of ‘cut’, it counts the non-null values for each of the other attributes in the dataset

--

--

Parveen Khurana
Parveen Khurana

Written by Parveen Khurana

Writing on Data Science, Philosophy, Emotional Health | Grateful for the little moments and every reader | Nature lover at heart | Follow for reflective musings

No responses yet