The joint distribution of two variables

Parveen Khurana
5 min readJan 15, 2021

So far we have discussed how to viz. and understand the distribution of an attribute, in this article, we discuss the joint distribution of two variables.

Joint distribution is helpful to understand how two variables are related so if we have ‘x’ and ‘y’ as two variables, we can plot two KDEs but we would not know for instance when ‘x’ is small what is the corresponding distribution of ‘y’ or in general the correlation between the two attributes.

Let’s first define two independent variables(both normally distributed)

And create a dataframe using these two variables

Now we can have a ‘jointplot’ leveraging the ‘sns.jointplot()’ and passing in the ‘x’ and ‘y’ columns of the newly created dataframe

Alternatively, we can directly pass in the ‘x’ and ‘y’ columns and specify the dataframe name as the value of the ‘data’ argument

What we get is a 2D plot where each dot(in the scatter plot) corresponds to one row/data item of the dataframe. It also reflects two histograms — one at the top which denotes the distribution with respect to the attribute on the x-axis that tells us how the data is located as we vary ‘x’; the other histogram is located on the y-axis towards the right of the scatter plot which tells the data distribution with respect to the…

--

--

Parveen Khurana
Parveen Khurana

Written by Parveen Khurana

Writing on Data Science, Philosophy, Emotional Health | Grateful for the little moments and every reader | Nature lover at heart | Follow for reflective musings

No responses yet