pairplot and jointplot Video Lecture Transcript
This transcript was automatically generated by Zoom, so there may be discrepancies between the video and the text.
14:11:32 Hi! Everybody! Welcome back in this video. We're continuing to learn about Seborn.
14:11:37 And we're gonna talk specifically about pair plot and joint plot.
14:11:42 So let's go ahead and open up that Jupiter notebook, and we'll get started.
14:11:46 So point, plot, pair, plot, and joint plot don't fit nicely within the sort of 3 flavor breakdown of the disk plot, the relational plot, or the categorical plots, mainly because they're sort of combining various seborn functions into one so these
14:12:05 Are really useful for exploring your data sets, and explaining your data sets to others.
14:12:10 So we're gonna learn about these 2 just to show you how powerful they are.
14:12:14 And then you can always explore more if you want to learn more about them.
14:12:18 Than what we've covered. So the first one we'll talk about is pair plot.
14:12:22 So here's a link to the documentation so pair plot is a figure level function.
14:12:29 It makes a grid of plots, and where each row and each column of the grid represents a variable, so each row will have one variable plotted on the vertical access, each column will have one variable all the plots in that column will have that variable
14:12:47 Plotted on the horizontal axis, and if you have a situation where the vertical and the horizontal axis are the same, variable like, say, Bill length, they will then plot instead a histogram or a kde plot depending upon different arguments to
14:13:04 The function. So I think this is a lot of words to describe something visual, and I think that means it's probably helpful to just show the visual thing.
14:13:13 So we're gonna go ahead and remember, we're using this penguins data set that has a different values for bill length.
14:13:20 Build, depth, flipper, length, and body mass for different types of penguins.
14:13:24 So we're gonna go ahead and call the pair plan.
14:13:28 So sns pair, plot, and we're gonna input data equals to ping.
14:13:35 And that's the only thing we're gonna input.
14:13:35 And we'll see what comes out. Okay. So we can see this grid.
14:13:40 It's 4 by 4, and each of the rows and columns are one of our continuous variables.
14:13:48 So Bill length build depth, slipper, length, and body mass, so in the diagonal we can see in the instances where the horizontal and the vertical variable would be the same.
14:14:00 We have a histogram for that variable. So, for instance, these are all the different.
14:14:05 The histogram of the bill length, the histogram of the build, depth, and so forth, and in situations where they are different.
14:14:12 So, for instance, in this bottom left hand corner we plot the scatter plot of, for instance, in this example, body mass and bill length. Okay?
14:14:24 And so these are extremely useful for doing data exploration, for things like regression problems where I'm trying to see does one variable look like it depends linearly upon the other variable.
14:14:36 So, for instance, looks like the bill length, and the body mass, maybe, are linear.
14:14:42 The bill depth in the body, mass or out. Like all these, appear to be linear, related to the body mass, I would say so.
14:14:49 Per plot, like basically every other seborn plot has a hue variable.
14:14:54 So we could input, for instance, the species. And now we can see one.
14:15:01 The scatter plots are colored like scatterplot would be colored.
14:15:06 In addition, we also see that when we have the hue equal to species, to make the plots a little less colored instead of a histogram, they show the Kde plot. Okay.
14:15:18 Another thing we can do is, you might notice that basically, all the plots in the upper left hand, upper right hand corner of the grid are essentially the exact same plots as the lower left-hand corner just flipping one of the axes with the other so maybe you don't want all
14:15:34 the because you think it's just repeated information that's not as useful to you.
14:15:39 So you can set an argument called corner equal to true, and when you do that it removes all of the plots from the upper right hand corner and just keeps the ones in the lower left hand corner the lower corner, the lower lower triangular like a lower triangular matrix okay, alright what's another
14:16:01 Thing you can do to customize this. Well, you let's say you don't want all these scatter plots, and you also don't want these kde or histogram plots.
14:16:11 You can specify what variables are put on the X-ray on each column, and what variables are are on each row.
14:16:19 So, if you have the X there's argument, you input a list of you, input a list of the different variables that you would like as columns.
14:16:30 So this will result in a grid where Bill Length is the first column bill.
14:16:36 Depth is the second column and flipper length is the third.
14:16:40 Notice that the difference from this and the regulars will no longer have a body mass column, and then finally, you also have a Y vari or maybe not finally, but you also have a wireless argument.
14:16:52 So? Why underscore Va. Rs, where bonnie mass is my only input to this list, meaning the only row is going to be the row corresponding to body mass.
14:17:04 And so this gets rid of what might be a lot of clutter if the only variable you're interested in as your W.
14:17:12 Variable is body mass. Then maybe you don't need these 3 other scatter plots.
14:17:17 And now you're looking directly at the body mass versus the other variables.
14:17:24 Okay, so pair plot is able to make scatter plots.
14:17:30 So what are you know? We see these scatter plots and the off diagonal entries.
14:17:35 What like there are other plots that we can make.
14:17:38 Sorry I was getting a little tired. There are other plots that we can make, so we can use.
14:17:43 Make these with the kind argument. So one of them is scattered plot.
14:17:48 That's the default. You can also make a regression plot by setting the kind equal to Reg.
14:17:55 You can make bivariate histograms by setting the kind equal to hiss, and you can make by various Kde plots by setting the kind equal to Kd.
14:18:05 E. So here I'm going to go ahead and set the kind equal to Reg, so you can see what that looks like.
14:18:13 Okay, so got my regression plots on there.
14:18:20 And you can see again this argument here, where I've got plot keywords.
14:18:25 All this is doing is setting my regression lines to be black, and my scatter points to be a little bit more.
14:18:32 See through, to make it easier to see. Okay, so this is a dictionary.
14:18:37 And again these arguments took in dictionaries.
14:18:41 You can also, for instance, make by various histograms, by setting the kind variable, equal to hissed
14:18:51 You can also change the type of plot on the diagonal, which is a histogram with the diag underscore kind arguments.
14:19:01 So the default is auto, and when you set it equal to auto pair, plot, we'll choose which plot type is best.
14:19:07 Depending upon, like the various arguments like, for instance, when you have a hue argument, it decides that the Kde plots are true. When you don't have a hue argument, it decides that the histograms are the best one
14:19:22 You can set it so it has to be hissed by making it, you know.
14:19:26 Diag kind equal to hiss. Kde, or you can make it so that they're are useless plots, and we'll see an example of this so here's an example where I have not enabled the hue.
14:19:39 But I still get the Kde plots. And then here's an example where I set my day ag kind argument equal to none, and you'll see that it justs these useless scatter plots of Bill length against the bill length was just results in the line Y equals X
14:20:02 So you can also change things about the plots. I sort of showed this earlier on.
14:20:07 So, for instance, we can change arguments for the individual non diagonal plots with plot underscore keywords.
14:20:16 So, for instance, I can change it so that my scatter plots have an alpha for their markers of point 3 and an edge color.
14:20:22 That's black, and for the a regression type plot I can remove.
14:20:28 For instance, the error bars, and we'll talk about this more. And I believe the next notebook will talk about what Ci non means
14:20:39 So here's that example. Okay, you can also customize what the diagonal plots look like.
14:20:48 So these plots here, with a similar argument of dye underscore keywords, so these take in dictionaries whose keys are the arguments to the relevant plot.
14:20:59 So these keys for the Reg kind would be arguments for the Reg plot command, which we were viewed in an earlier notebook.
14:21:09 If you had, for instance, a Bivariate histogram, the keys of the dictionary would be arguments to the histogram plot, and so forth.
14:21:18 So we haven't covered everything extensively about pair plot.
14:21:22 But I'd say you have a pretty good base now, if you're trying to understand more what we covered, or you like to see what else is possible, I encourage you to check out the documentation that I linked to above the other type of plot, we're gonna talk about is a joint
14:21:36 plot so joint plots are pretty fun. They are a figure-level function that returns 3 plots.
14:21:43 Our returns, a plot with at least 3 subplots at the same time, so there's gonna be a main plot, and then 2 side plots.
14:21:51 So the main plot is gonna take up the bulk of the figure.
14:21:55 It will be a large rectangle, and this rectangle looks at the distribution of some variable, so it will be like a queue plot, a hiss plot something like that.
14:22:06 And then on top of that the 2 side plots are going to be univariate distributional plots.
14:22:13 One's going to be horizontal, and above the horizontal or parallel to the horizontal axis, and I'm drawn on the top, and it's going to give the distribution of the horizontal variable and the other is going to be parallel to the
14:22:24 vertical access, and give the distribution of the vertical variable.
14:22:28 So again. I think it's interesting, or maybe not interesting, but useful to see this in action.
14:22:36 So we're gonna call joint plot again. We're using that penguin data set.
14:22:41 I'm gonna set my X equal to Bill Length I'm gonna set my y equal to bill depth.
14:22:51 Okay, so here we've got a scatter plot as my main plot, and then on top, I've got the histogram for Bill Length or a histogram for Bill length, and I've got a histogram for bill depth on the side.
14:23:07 So this plot on the top and this plot on the right hand side.
14:23:12 These are norm known as the marginal plots, and that will be seen.
14:23:17 Why is that name relevant? Why do we care? We'll see in a second when we see some arguments to change the appearance of these, so like with all of our seborn functions, we can color the points according to another variable.
14:23:31 So for instance, here, what's color it according to species?
14:23:34 So now we've got a scatter plot coloring into species.
14:23:37 And now notice, just like with pair plot instead of a histogram Seborn is going to draw a Kde plot as the marginal plots.
14:23:48 Now unlike pair plot, there's not an argument that allows us to change the type back to a histogram.
14:23:54 We're stuck with what Seborn thinks is the best idea.
14:24:00 So we can change what gets plotted in the Bivariate region.
14:24:05 So the main part of the plot, with a kind argument.
14:24:07 So the default is scatter, but we can also draw a Kde a Bivariate Kde plot, a Bivariate histogram, plot, a Bivariate reg plot.
14:24:19 So it would just be this, but with regressions on it, something called the hexagonal histogram, which we'll see in a second, and then finally a residual plot which we did not cover and I'm not going to cover right now but you can check it out with this link in
14:24:33 The documentation. So I said, I'm gonna demonstrate what a hexagonal histogram is.
14:24:39 Let's go ahead. And check that out. So I'm going to enter a kind argument of hex and then we'll see.
14:24:45 So hexagonal histograms. Well, tile the Cartesian plane with hexagons, and then it will count up the number of observations within each hexagon, and color it accordingly.
14:25:05 So again, the lighter hexagons have fewer observations, and for instance, these white ones have 0 observations, and then the darker hexagons have the moat like the highest number of have more observations with them.
14:25:19 Okay? So this is an example of a hexagonal histogram
14:25:24 So we can make aesthetic arguments like in pair plot with dictionaries.
14:25:31 So for ones. Let's say we're interested in changing the arguments to the Bivariate plot. You can do that.
14:25:38 Way, providing a dictionary with joint underscore Kws, which stands for joint keywords, and you can change arguments to the marginal plots like the appearance of the histogram with marginal keywords so let's look at an example.
14:25:52 Where we change the appearance of the histograms, so we can set the number of bins to be 20, and we can change that.
14:26:03 We can make them see through by setting the fill equal to false
14:26:08 Okay.
14:26:11 So that's how we can change the appearance of our marginal plots.
14:26:17 We can't change what type of plot they are.
14:26:19 But we can change it so that the once we have that plot, it looks the way we would like.
14:26:25 Similarly, we could change things about the by, vary it with the joint keywords.
14:26:30 Argument where we provide a dictionary, where the keys are arguments from the relevant function and the values are the values we'd want that to be so.
14:26:40 If it was a scatter we could do things like queue.
14:26:43 Or not. Here we could do things like S and Edge color, and Alpha, etc.
14:26:50 Okay, alright. So you can even customize this phone. But again, I'm gonna leave it to you to check out the documentation of joint plot.
14:26:59 And let me remind you where that is. So the documentation for both joint plots and pair plots do not provide the joint plot.
14:27:07 One. Let's see joint plot
14:27:12 I did not so in your version I'm gonna put the documentation for joint plot like right here.
14:27:19 It's not in the version you're looking at right now, but once you have the version in your hands on your computer, there will be a documentation link for joint plot.
14:27:28 Okay. So that being said, that's everything for pair, plot and joint plot.
14:27:33 They didn't really nicely fit into the other plot types that we've covered.
14:27:37 But they're worth knowing, because they make very nice extensive plots that are useful for both data exploration as well as like people have made plots like this to look at distributions of variables for presentations as well.
14:27:51 Okay. So I hope that that was enjoyable. I hope you enjoyed watching this video.