Error Bars and Confidence Intervals Video Lecture Transcript This transcript was automatically generated by Zoom, so there may be discrepancies between the video and the text. 14:32:53 Hi! Everybody! Welcome back in this video, we continue to learn about Seborn. 14:32:58 And we're gonna learn how the error bars and confidence intervals are drawn on certain seaborn plots. 14:33:04 So let's go ahead and get started. So we've seen a number of plotting functions that produce error bars or some sort of confidence interval. 14:33:14 So in, for instance, the line plot, the Reg plot, and Lm. 14:33:19 Plot, the bar plot in the Point plot. So some of these can some of these features can be useful. 14:33:28 These bars and these bands can be useful in trying to help us gauge a couple of things. 14:33:33 So, for instance, the uncertainty affiliated with the presented statistical estimate. 14:33:38 For example, with confidence intervals, or the standard error, and then sort of the spread of the data around the statistical estimate with the standard deviation, or something called the percentile interval. 14:33:50 So as with any statistical process, it's good to not go in blind, and it's useful to know how these are made. 14:33:59 So by understanding how these estimates and these error bars are made, and seborn. 14:34:04 It helps us understand. Sort of the limitations of them by consulting with our statistical texts and knowledge. 14:34:12 So I just want to use this notebook as an opportunity to explain how these varied, how these various error bars are drawn. 14:34:20 So that way, you understand what you're seeing when you look at them on a Seborn produced plot so we're gonna break this down into 2. 14:34:28 Sort of categories. The first are those functions that estimates tested. 14:34:34 It's like line plot with the mean or the median bar plot with mean proportions or Median, and then the same thing with point plot. 14:34:43 Right, so you feed the data into these, they calculate an estimate, and then they provide an error bar or Arab band of some kind. 14:34:52 So the 4 types that you can produce are the confidence interval. 14:34:56 The percentile interval, the standard error intervals, which is just adding the standard error plus or minus to the estimate. 14:35:05 And then the standard deviation interval, which is actually adding the standard deviation of the data to to and from the estimate. 14:35:12 So this first 3 of these, the confidence intervals, the percentile intervals in the standard error intervals are calculated using a technique called bootstrapping. 14:35:24 And it's this is not a class and statistics. 14:35:26 So I'm not going to dive too far into what bootstrapping is, but the main idea is many times you're going to sub sample the original data set capital. 14:35:37 Take a sample of it, randomly, calculate the testing you're interested in. 14:35:44 So, for instance, the mean, and then you'll have a set of 100 means, and then you use those processes to then calculate sort of the confidence interval or the percentile interval, or the standard error involved. 14:35:55 Using those one like, let's say, a 100 100 randomly sampled sets from the original data. 14:36:04 Again, if you click on this link it will take you to the Wikipedia post on bootstrapping, so you can learn more about it for yourself. 14:36:11 But for now that's just the general idea of how bootstraping works. 14:36:16 The final error bar type, the standard deviation is just obtained by calculating the standard deviation of the observations so we can select which type of error bar is drawn with the error bar argument. 14:36:30 And so again, we're going to use our penguin data set as an example. 14:36:33 So we're gonna go ahead and start with a point plot where we draw the body mass against the species and then stratify on, not stratified, but also subcategory of sex. 14:36:47 So let's say we want to use the standard deviation. 14:36:51 We would say, error bar equals to Sd. As a string so now these bars that are being drawn are the standard deviations by default. 14:37:01 So like. Let's say a commented this out and just drew it, as is, I believe these error bars by default are the confidence intervals, and we can always check that 14:37:12 To see if we set this equal to Ci. If we receive the same plot, and it looks like we do. 14:37:19 Okay, so just as a quick double check. Yep, okay. 14:37:26 So let's go back to the original desired one, which was the standard deviation. 14:37:31 So this is the standard deviation around the bar. So another argument. 14:37:38 So you can just give a string, and it will use whatever default settings it would like. 14:37:42 Another way. You can give an argument to the error bar is with a tuple. 14:37:46 So these are the following, tuple forms you can use first, you're going to have your 0 entry. 14:37:50 Be the string of the type you want and if you're type is of an interval kind, you put the size so for the confidence interval size, you can be between 0 and 100 inclusive, and for instance, the common one that we've all heard of right is a 95% confidence 14:38:09 Interval. So you put it in there. You put 95 in, and you'll get a point 95 in, and you'll get a 95% confidence interval. 14:38:17 Similarly, for percentile interval, you put the pi, and then the size, for the standard error and standard deviation you have to put the scale, which is just a number bigger than 0. 14:38:29 And so this is like, if you put in, for instance, 2 with the standard deviation, it would be the estimate plus or minus 2 standard deviation. So that's exactly what we'll do. 14:38:41 But I guess instead of 2, I say, I want to use 3. 14:38:45 So we put in our tuple Sd comma 3. 14:38:48 And now my error bars go from the estimate up to 3 standard deviations away in either direction. 14:38:56 So to learn more about additional error, bar options, and maybe learn more about how these are implemented. 14:39:05 I've provided both the 2 torrential on the error bars, and then this nice post on the issues page of the Github. 14:39:11 That explains how percentile interview it. Percentile intervals are calculated in Seborn. 14:39:17 So are drawn. I guess the other type of function we'll look at are those that plot regressions. 14:39:24 So remember, maybe as a quick aside, before we do this part with the Reg plotter or Lm. Plot, we're plotting our points along with a by defaults. 14:39:37 An ordinary lease, squares regression, and then we have these error bars. 14:39:41 So currently the only error bars that you can get with Llm. 14:39:46 Plot are the confidence interval bars, and this is also drawn and decided, using bootstrapping. 14:39:53 So they don't use the confidence interval formula that comes from your standard statistics, courses they use bootstrapping to calculate these in intervals. 14:40:02 So you can choose the size of your confidence interval like a 95%, 99%, etc. 14:40:10 By using values between 0 and 100. So, for instance, I could put in 99.9 9 9 14:40:18 Unexpected, indent, oh, that's cause I, for here we go, and I can get the bars that way. 14:40:24 I could get like I could put in 10, and I'll get it much narrower. 14:40:28 A much narrow, narrower, a much smaller bar so now it looks like the bar is virtually not existing, because a 10% confidence interval is very small. 14:40:39 Or I could go back to the default, which is 95 alternative alternatively, I can turn off the confidence interval all together by setting Ci equal to false. 14:40:50 And now I just have the lines as they were okay. So you can learn more again about the error bars for the regression variables by checking out the error bar tutorial or the regression tutorial from the Seborn documentation so now you have a better idea of how 14:41:07 These various error bars and bands are drawn so that will help you to better understand and communicate your charts to your audience, as well as understand it for yourself. 14:41:20 So, and I hope you enjoyed learning about this and the next notebook. 14:41:22 We'll talk about how to change various non graphical aesthetics, using seaboard. 14:41:29 And I think that will then be basically our last seborn notebook where we're coding things up. 14:41:35 Okay. So I enjoyed having you watch this video. I hope to see you in the next video.