The joint distribution of two variables
So far we have discussed how to viz. and understand the distribution of an attribute, in this article, we discuss the joint distribution of two variables.
Joint distribution is helpful to understand how two variables are related so if we have ‘x’ and ‘y’ as two variables, we can plot two KDEs but we would not know for instance when ‘x’ is small what is the corresponding distribution of ‘y’ or in general the correlation between the two attributes.
Let’s first define two independent variables(both normally distributed)
And create a dataframe using these two variables
Now we can have a ‘jointplot’ leveraging the ‘sns.jointplot()’ and passing in the ‘x’ and ‘y’ columns of the newly created dataframe
Alternatively, we can directly pass in the ‘x’ and ‘y’ columns and specify the dataframe name as the value of the ‘data’ argument
What we get is a 2D plot where each dot(in the scatter plot) corresponds to one row/data item of the dataframe. It also reflects two histograms — one at the top which denotes the distribution with respect to the attribute on the x-axis that tells us how the data is located as we vary ‘x’; the other histogram is located on the y-axis towards the right of the scatter plot which tells the data distribution with respect to the…