Data for seaborn Video Lecture Transcript This transcript was automatically generated by Zoom, so there may be discrepancies between the video and the text. 11:57:04 Hi! Everybody welcome back! We're gonna continue to learn about Seborn and start talking about the different data that you can put into a seborn plotting function. 11:57:14 So let's go ahead and dive into the Jupiter notebook and learn more. 11:57:19 So as we spoke about in the introduction, Seborn was a package created for statistical visualizations, meaning that it's going to need some data in order to make those visualizations. 11:57:29 So in this notebook we're going to talk about what are the specific? 11:57:33 What's the specific format of data that Seborn likes to receive? 11:57:36 What are a few different ways that you can input data into a seborn plotting function. 11:57:41 So we're not going to learn the ins and outs of individual functions in this notebook. 11:57:45 But what we will learn is the general format for inputting data into a Seborn fung plotting function, and then how you can use the different aspects of that data to change the arguments to specific functions. 11:57:58 So Seborn prefers to get data, and what's known as a long form format. 11:58:02 And so this might just seem familiar to you if you're used to a data frames or spreadsheets were basically each, you basically have a series of unique columns and rows. 11:58:15 And so are like each column represents a variable, and then each row represents an observation of that variable. 11:58:23 So for instance, like, maybe this is a time series of various stock prices. 11:58:27 So the first column could be. This, the trading day, the second column could be the stock of route for a one stock, the closing price for a different stock, and so forth. 11:58:38 So that's basically it. It prefers to get data in this format meaning that you can input things like lists and tuples directly into the arguments which we'll see an example of in a second data frames pandas, data frames and series fall into this format as well, as 11:58:55 Just a regular old dictionary, where the columns are the keys of the dictionaries. 11:59:01 Or the you know the columns are specified by the keys of the dictionaries, and the values would be something like a list or a tuple. 11:59:06 So we're actually gonna see examples of all this with scatter plot, which is a function that creates a scatter plot. 11:59:13 Given the data input, so I'm gonna go ahead and randomly generate some data and then I'll just show you what this data is. 11:59:20 So x is a numpy array. Why is it umpire? A. 11:59:25 And then labels is an array of labels. So A, B and A and B, okay, so what we're gonna do is the first way we can enter data is just to directly enter the arrays or lists or tuples themselves. 11:59:41 So scatter, plot takes in an X argument, so X will just be equal to X. 11:59:45 Why argument for the vertical, variable, so y is equal to y, and then finally, we can assign colors by doing something like colors is equal to a categorical variable. 11:59:58 So color is equal to labels. And so this will color the points. 12:00:02 According to the label. So there arell be points that are B will be one color and points that are a will be a different color. 12:00:09 Okay, so this is one way to do it. Oh, Jeeze, what did I do? 12:00:21 Hugh. Sorry not color. I meant to be here, so Hugh, is the argument for color, and Cborn. 12:00:28 So, if you you can do color as a different argument. But the the argument that allows you we'll learn more about this in a later notebook. 12:00:35 But the argument that allows you to specify the color based upon an array or a tuple is hue. 12:00:42 Okay. Alright. So that was one way. And I also mentioned that we can do data dictionaries and data frame. 12:00:48 So let's see how to do that. So when we do this directly, we have to provide each argument its own list, Tuple or array when we use a data frame, it's actually pretty nice so I'm going to just take X Y and labels and turn it into a data frame so 12:01:05 Here's our data frames. First, 5 entries. So Xy label. 12:01:09 And when we have a data frame, what we can do is the first argument we put is data equal to df, so now our data frame has been input to the plotting function. 12:01:21 And when we enter something like a data frame or a dictionary, we can go ahead. 12:01:26 And now, instead of doing X equals X as a variable, because our data frame has an X column, we put in the string X, so this is going to be the column name as a string of the variable, you want on the horizontal axis then for the vertical axis we'll 12:01:43 Put in the string. Y, because that's the variable we want for the vertical axis, and then finally, for the hue we're going to put in the string for the label column, which is label. 12:01:54 So once you input a data frame or a dictionary as we'll see in the next example, all of the other arguments can be input as strings of column names. 12:02:04 So now we'll plot this and you can compare quickly it's the same plot as if we just fed the data to it directly, the nice feature being we had a data frame which is how a lot of our data is going to come to us. 12:02:18 In at least this course. So we have this data frame, and then we can specify the column names that we want to use for the various plotting arguments. 12:02:27 In addition to a data frame, a dictionary is also a columnner data type. 12:02:33 So we have our columns, which are the keys, and then we have our, you know, the actual values, the rows of each variable is the values. 12:02:41 Okay. So and then we can look, you know, data dictionary. 12:02:48 Okay, so it's the same exact thing. Where now, instead of data frame as our data argument, our data argument here is just going to be data dictionary. 12:02:59 And then all of these other arguments I didn't bother typing them in again, because they're just the same. 12:03:06 See so same exact plot. But now we're using addictionary instead of a data frame. 12:03:12 So either either works. If your data already comes in a data frame like a lot of the examples will see in these seaboard notebooks. 12:03:21 Just use the data frame. No need to turn it into a dictionary. 12:03:24 This was just to demonstrate that when we say long format, it can just be a series of arrays or tuples or lists, a data frame, or a dictionary. 12:03:45 That you would like for the different plotting arguments can just be entered as have to be entered as the string name of the columns. 12:03:53 You're interested in. Okay, so now we know the general format and pattern of entering data into a seborn plotting function. 12:04:02 And the next notebook, we're gonna start actually diving into these plotting functions and tell you sort of the difference between the 2 possible function types that most seborn functions fall into. 12:04:14 I hope you enjoyed learning about how to input data into a Seborn plotting function.