relplots Video Lecture Transcript
This transcript was automatically generated by Zoom, so there may be discrepancies between the video and the text.
12:22:29 Hi, everybody, welcome back in this video we continue to learn about Seborn, where we're gonna learn about relational plots or reel plots in Seborn. So let me go ahead and get our
12:22:43 Jupiter, notebook up and we'll get started.
12:22:45 So we're gonna introduce the relational plot type. So what does that mean?
12:22:49 In general, and then show you what it means specifically in Seaborn.
12:22:54 We'll learn about the Rel plot function and then specifically, the scatter plot and line plot functions.
12:23:01 And then at the end we'll see 2 spin-offs that are sort of combined.
12:23:05 These 2 rel plots into a single figure so relational plots are plot types that look at the potential relationships between variables.
12:23:15 So, for instance, a scatterplot where I'm plotting one variable against another, and trying to see if there's any relationship between the 2.
12:23:22 So that's the idea behind relational plots.
12:23:26 These are used quite a bit in exploratory data analysis as well as just, you know.
12:23:31 Let's say you actually find a relationship between what you think is a relationship between 2 variables.
12:23:35 You might make a scatterplot to show that to your office.
12:23:38 So we're gonna start with the 2 axes level and then show you the figure level.
12:23:43 And then at the end we'll show you sort of a they're not technically relational plots, but they do use sort of the same relational plot makeup.
12:23:55 So the first one we've seen in all of our example plots in the past 2 notebooks is scatterplot.
12:24:01 So this is the way that Seborn makes a scatter plot.
12:24:04 This is an axis level function, and once again we're going to use the tips data set which gives the tip received by various waiters on a variety of waiter shifts.
12:24:14 So maybe on Sunday, at dinner, with the table of 2 sort of how much the bill was, and what the tip was.
12:24:22 My natural question is, what is the relationship, if any, between the final tip you receive, and then the total bill.
12:24:29 So maybe you want to make a scatter plot of that, so you'll call Sns.
12:24:35 Scatter, plot! You'll input the data, which is the tips data frame.
12:24:40 And now that we have the data, we just have to specify what else we want using the argument.
12:24:49 So we want total Bill on the horizontal axis.
12:24:51 So total bill is equal, or X is equal to total bill, and we want tip on the vertical axis, so y is equal to tip.
12:24:59 And now we'll display the figure with plt dot show.
12:25:03 So here we go. We've got exactly what we want.
12:25:05 And we could actually maybe make this even a little bigger. Let's say 6 and 4.
12:25:09 Okay, so that's how you make a scatter plot.
12:25:14 One thing to point out that we haven't pointed out yet is you can notice that both the horizontal and the vertical axes labels are provided by the column names that you've given.
12:25:25 This is done by default, and I will point out it's bad practice.
12:25:29 If let's say this isn't just for internal use, you're going to display this plot to some sort of audience or somebody else.
12:25:36 You should not leave. Do not leave the regular column names unless they're already a nice, human, readable thing.
12:25:44 So in a later notebook we'll learn some ways that you can change it.
12:25:47 But one way is, you can just use the stuff we learned about Matt Plotlibs.
12:25:51 You can do. Ax set X label to be total bill and dollars, and then ax down.
12:26:00 Set y label to be a tip amount in dollars.
12:26:07 So you always want
12:26:10 You always want
12:26:13 To be having human readable labels, meaning like the way that humans read words, not computers.
12:26:20 Okay. Alright. So what are some other scatterplant arguments?
12:26:26 So we've seen the hue argument which allows us to color our our our scatter plot markers by another variable in addition, you can go ahead and change the way that the caller shows up using a palette argument.
12:26:40 So this argument can be a list of colors that you want.
12:26:43 If it's category category or it can be a color palette from Seborn, which you can find on this list of all the different color palette.
12:26:51 So, for example, just SMS dot color palette SMS dot color palette pastel, so you can use these as well.
12:26:59 You can change the size of the point based on, if a variable with size and if you want, let's say you have a categorical variable that you're feeding into this.
12:27:10 You can use the sizes. Variable argument where you're going to input a dictionary that has a mapping from the unique vari values for that column to the size that you'd like.
12:27:23 So we'll see an example of this in just a little bit.
12:27:25 You can also change the style of the markers with a style, argument, that's another way, and then you can control which markers are used for which values with the markers, argument and so forth.
12:27:38 It also can take in all of the standard. Matt plot, lib.
12:27:40 Sketter plot, arguments, like Alpha color, edge color. S.
12:27:43 And line with. So let's go ahead and we're gonna see a few examples.
12:27:48 So you know, we've got hue, and we can color our points by the day of the week.
12:27:52 By just putting and believe the column is day correct? Yes, day okay.
12:27:57 So now they're colored, and then maybe I don't know.
12:28:01 I'm gonna try and code on the fly and then see if I get it right.
12:28:05 So let's say palette is equal to.
12:28:08 And why don't we try this pastel? One
12:28:15 So we'll copy and paste
12:28:19 Okay. Yup.
12:28:24 Alright. So another thing we can do is set the size so let's say, we want to set the size to be the size column.
12:28:32 There we go, and I will point out you if you tried to put in a number.
12:28:36 I believe you will get an error. Oh, I guess not.
12:28:39 It's just going to assume that they all have the same value.
12:28:42 But that's not quite what we want, so we'll see an example later where you can set the size to be a uniform size using the S argument.
12:28:50 I know it's confusing. S. In size, but this is how Seborn works.
12:28:54 So you'll see that the size is like giving you a legend.
12:28:58 That's because it's assuming that the science column is a categorical, variable.
12:29:04 And so each category 1, 2, 3, 4, 5, 6 is being assigned to a unique size.
12:29:10 So in that situation, you can create a dictionary where each of your categorical, variable values possible values is mapped to a distinct size.
12:29:20 And so here we can see that we're gonna map.
12:29:23 I. One to 52 to 103 to 150, and so forth.
12:29:29 And then you can use. The sizes argument to control how big each of those points are.
12:29:33 Right? So there's an example where you can see the difference between no sizes.
12:29:39 Argument here and then the sizes are used. You can also just create, you know.
12:29:46 Let's say you have an array of sizes.
12:29:47 You want to use, or just a number you can just use the S argument for Matt plot, lib, scatter, to create various sizes.
12:29:55 Okay.
12:29:57 So there are other arguments for schedule, plot, but I think we've gone over the main ones that you might be interested in using.
12:30:04 If you would like to learn more, you can go to the Scatter plot documentation here and see all of the possible, all of the possible ways that you can alter the appearance of your scatterplot
12:30:16 The next plot that we're gonna learn about, or at least the next plotting function sorry that we're gonna learn about is the line plot.
12:30:23 So line plot is Seborn's way of making a line plot.
12:30:26 So it's kind of like plot from Matt Platlib.
12:30:30 But with some slight differences that we'll see. So the first thing I'm gonna do is just generate some data which is a random walk with a 100 steps, and I'm going to store that in Df, and then we're going to show you the format so I'm just going to first input my
12:30:45 Data frame as my data arguments. And then for X, I'm going to put the step column and then for why I'm going to input the position column.
12:30:56 So X is the horizontal coordinates, and y is the vertical coordinates.
12:31:01 And now, when I run this, I can see now I've plotted my random walk.
12:31:05 Okay, so something that chain is a distinguishes line plot from Matt plot Lib's plot function is that line plot has a different lay of handling situations where maybe I have repeated X value.
12:31:20 So one way to think about this is maybe this random walk process.
12:31:24 I did it 100 unique times. And then so when I, if I do it 100 unique times in each of the entries, 0 through a 99 are going to have 100 positions for each of the random walks, so mat line plot Matt plot lib.
12:31:42 Would sort of plot it, and then go back and then plot it, and then go back and then plot it and go back.
12:31:46 Line plot is going to sort of average all of the values you give it. So in order to see that I'm going to make this data frame random walks
12:31:58 So random walks dot head. So there's walk which we can show you.
12:32:05 Has random walk, dot value and maybe random walk dot unique
12:32:15 Random, walk, dot dot unique
12:32:19 Randall walks. Talk sorry about that, so you can see all the we have a 100 different random walks, and then on top of that, you know, for each random walk we keep track of the step in the position, and on top of that I've sort of imagined we're maybe doing some sort. Of intervention.
12:32:33 Or something so there's also study group, which has A and B, okay, so this will come up in a later example.
12:32:41 So the way that this works is by default, like we're gonna have 100 steps at 0 100 positions at step, 1 100 positions at step 2.
12:32:52 Because we have 100 unique random walks. So when this sort of thing happens by default, line plot is going to plot the arithmetic mean of the Y value for each repeated value of X, and so, as an example, let's say that our data frame had the following 4 points for X equals
12:33:11 21. It would then, for the Y value plot the average of 1013, negative set 17 and negative 2, which is 4.
12:33:20 Okay, so let's go ahead and see this in action.
12:33:25 Alright! So this middle solid line, this is the average Y position, and these shaded bands that you're seeing these are the sort of error bars we're gonna actually talk more about what these error bars are doing.
12:33:39 And how they're interpreted in a later distinct notebook.
12:33:43 But for now just think of them as like error, bars or confidence intervals.
12:33:47 Sort of telling you like. What's the the plausible range?
12:33:51 That's one way to think about it, for now, until we get more specific it's not the 100% correct way.
12:33:57 So if you're a stats person, don't come at me, but we're gonna be more specific about what these bars are plotting, shaded bars are plotting in a later note.
12:34:06 Okay, so let's say, you actually don't want the standard error, the standard Deviation conference interval, whatever you can change the error bar to be none.
12:34:16 And when we do that you're just going to get left with the average position.
12:34:20 Okay. And you might think, wow, that looks really different. That's because the bounds here go from negative 10 to 7.5, whereas the balance here go from negative 1.5 to 0 point 2 5.
12:34:32 So that's why the 2 lines look so different. So you can also change it.
12:34:36 So this is the average position, but we can also change it to other different averaging functions.
12:34:42 So, for instance, we can plot the Median position for each value of X instead.
12:34:47 So here, here's that average position. That is the defaults.
12:34:51 And then here we can change the estimator to be a median instead of mean, and when we do that we'll get the median position.
12:34:59 So we can plot this, and we can see the the mean position versus the Median position.
12:35:05 We can also just turn off the averaging effect altogether if we set estimator equal to none, and when you have sort of a a weird way of doing this, so like we're gonna show off this units argument in a second, but let's say that we have something with repeated x's
12:35:22 like we do here. If you set the estimator off for this, it gives you this really weird effect.
12:35:28 So this is an advised, if you're if you have a situation like this, where you have 100 distinct values, right for each value of X, so it doesn't really know how to handle that.
12:35:40 But for instance, let's say we wanted to use this to plot all 100 random walks at once.
12:35:45 We can do that with this units argument that I've commented out for now.
12:35:51 So if I set units equal to the specific column which keeps track of the walk, it will now plot all 100 distinct random walks.
12:36:00 And so here's a situation where you do want estimator equal to none, because it doesn't need to calculate the mean, because it's just going to plot each individual walk.
12:36:09 So some other line plot arguments are, you know, we have hue and size and style which works exactly the same way as it does in scatter plot.
12:36:21 The only difference here is like in addition to markers which would control what kind of markers to put on your line plot?
12:36:27 You have a dashes argument which you can provide a dictionary with the dash pictures.
12:36:33 You can also use the standard plot arguments like color.
12:36:36 Alpha line with, etc. So here's some examples where here I'm gonna color the line be by the the study group
12:36:47 Okay. And so we can see now that you know, study Group A's line.
12:36:52 And so once again, this is an important thing to point out when we have something like this, it's going to split off.
12:36:56 And then average all the ones of study group A, so that's this blue line and all of the ones of study group B, so here it seems like whatever our quote, intervention was resulted in drastically different random walks
12:37:11 Okay. So another thing you can do is you can specify some currently, it sort of does the Matt Potlib default right of blue and orange.
12:37:19 But we could also specify that we want our first study group to be read in our second study group to be black with the palette argument
12:37:30 Okay, we can also just set the color of the line altogether with the color argument so now it's the same line as before, right?
12:37:40 But instead of being blue, it's red.
12:37:46 And there are some other arguments that you might be interested in exploring again.
12:37:50 I'm gonna leave it to you because I's an extensive list.
12:37:52 But there is a documentation like
12:37:58 I think there should be one. There's a documentation link here that goes to line plot where you can kinda look at all these on your own time, and you know, see how the different ways that you can change your line plot.
12:38:12 Okay. So we've looked at the 2 axes level functions for relational plots.
12:38:18 The upper level, the figure-level function is called rel plot for relational plots.
12:38:25 This is sort of our parent in the sense that we can make either a scatter or a line plot by calling rel plot, and the only way the way to control that is, with a kind argument.
12:38:35 So here, why don't we go ahead and do line plot?
12:38:39 So we're gonna data is equal to random walks.
12:38:44 My X is again going to be the step. My Y is the position, and then finally, I'll put in the argument.
12:38:53 Kind is equal to line, so by setting my kind equal to line, it's going to produce the same line plot as below as above.
12:39:02 Noticing that the difference is just the aesthetics of the axes themselves.
12:39:08 Okay, so we can also recall that just as a reminder when you have a figure level function, you're able to make a facet grid.
12:39:17 So let's say I want my columns to be the walk itself.
12:39:22 So I want each random walk to have its own plot.
12:39:25 But remember, and even in this example, where I've subset the data, there's quite a few random walks.
12:39:31 So, if you have something that you suspect is going to result in too many columns to be readable, you can input this argument called column wrap.
12:39:43 So col underscore wrap within a W.
12:39:48 This is, gonna go ahead and say, like, after 3 columns are drawn, Go on to the next row
12:39:56 Okay.
12:39:58 So you can see that after we get through 3 walks we go to the next row.
12:40:03 Plot another 3 walks in their own grid, and so forth.
12:40:08 So there are 2 spins on rail plots that are you worth knowing, and these are regression based spins.
12:40:15 So these functions kind of draw both a scatter plot and a line plot.
12:40:18 In a sense, so both of them are going to plot a scanner plot of the variables you're interested in.
12:40:23 But then, on top of that they plot a line plot on top that gives the fitted regression line for some sort of regression.
12:40:31 The default is just a regular, ordinary, least squares regression.
12:40:35 But there are other options as well. So the figure level version of this is called the Lm.
12:40:41 Plot for linear model plot, and then the axes level is called Reg Plot, both of which have their documentations here.
12:40:48 So we're gonna return to this tips data set.
12:40:51 So my data for Reg plot, as my first example is going to be data equals tips. And then I'm gonna go ahead and put in my total bill on the X
12:41:03 And my tip on the Y, okay? So now you also see this argument here called Scatter keywords equals.
12:41:14 Alpha point 6 line keywords equals color.
12:41:18 So what this does? Is it changes the aesthetic arguments of the scatter points.
12:41:24 So I'm going to make them a little bit seafood. And then I'm changing the aesthetic options of the line that gets drawn to make it a black line
12:41:31 Okay, so this was just by default, like, if I let's say I got rid of this line keywords, this is what it looks like.
12:41:38 And I I kind of find it personally difficult to distinguish the blue on top of the blue, so I like to add in this argument here, so that my line is a different color from my points.
12:41:52 Okay, so this is the ordinary. Least squares regression regressing tip on top of total bill, we can also change the type of regression that happens.
12:42:02 You can do sort of a local regression by setting an argument low. S.
12:42:07 Equals, true and here's what that looks like. So this is the local regression, and you can compare it to the the simple linear regression.
12:42:18 So Reg plot does not currently include arguments like hue and style.
12:42:25 However, the figure level version of the function does so. Lm plot allows us to do color by an argument like sex.
12:42:36 So when you do this, what actually happens is you're regressing Tip, not just on total bill, but also on the sex variable.
12:42:44 So in the background, it's running a regression on total bill.
12:42:48 And then some sort of indicator for male or female, and then what gets what gets plotted is the resulting.
12:42:55 The resulting lines, and then, as I said before, these are error bars, and we'll talk more about these explicitly.
12:43:02 In a later notebook, and then just once again on top of you know, Hugh, you also, because it's a figure level function, have the ability to do a facet grid.
12:43:13 So here's an example where I we deal. Tip and then regress on both sex and day of the week, and you can see that maybe there's a possibility that the day of the week makes a difference.
12:43:26 However, some of our the number of observations we have for the different days of the week are a little bit small, so these are nice tools for exploratory data analysis.
12:43:36 Particularly if you're working on regression. So it allows you to get a sense of like maybe this is an important variable to include.
12:43:44 Maybe this is not, and it also allows you to look at different interactions.
12:43:48 So a potential drawback of these 2 functions, though, is, it might be, desirable right you've already fit this regression in the background and then plotted the line so could you please give me the coefficients, the intercepts, etc.
12:44:04 Seborne does not currently have that functionality, and according to this link from the Seborn Github page, the the creator and developer of Seaborn, has no desire to include that functionality.
12:44:17 So you're limited to really only using this as an exploratory data analysis tool and then if you finally have our aggression model that you like, you are gonna have to fit it yourself and then using something like side Kit learn or stats models and then you know, create basically all of the
12:44:35 coefficients find those on your own using whatever program or package you'd like.
12:44:40 So now we've been introduced to the relational plots.
12:44:43 We've got our scatter plots. We've got our line plots, we've seen some examples of including different aesthetic arguments like Hou size color.
12:44:55 And then we've also learned a little bit more about doing regression-based plots.
12:44:59 That sort of build and combine the scatter, plot in the line plot.
12:45:03 So in the next notebook we'll move on to the next of the 3 plot categories which are distribution plots where we or we'll plot various empirical disruptions of variables.
12:45:15 I hope you enjoyed watching this video. I enjoyed having you on this video, and I hope to see it next time.