Supervised Learning Introduction Video Lecture Transcript
This transcript was automatically generated, so there may be discrepancies between the video and the text.
Hi, welcome back in this video. We're gonna start our series on supervised learning with an introduction to supervised learning. Let me go ahead and share the Jupiter notebook we'll be working on. So this is the introduction notebook stored in the supervised learning folder within the lectures. So, supervised learning is a large sub-field of statistical learning uh that is frequently utilized in data science.
So in this subsection, we're going to learn a little bit more about supervised learning models in general as well as particular supervised learning algorithms. So we'll start by asking ourselves what is supervised learning. So it's a form of statistical learning problem in which the outcome variable is something we know and we want to be able to predict it in the future.
Uh In particular, a distinction between supervised learning and other sub fields of statistical learning is that the data set we start with uh which will soon start to know as maybe a training set or a sample will contain observations of a particular value of interest. So something we're interested in that we want to be able to predict uh if we have that data when we start out with, and we can build a model to predict that data that's gonna be known as supervised learning.
We're gonna typically consider two types of supervised learning uh problems uh particularly in this series, we're gonna focus on regression and classification. So a regression problem is a problem in which you are using data to predict an outcome that is continuous. Now, I want to point out that importantly, when we say continuous here, we don't mean continuous in the mathematical sense.
So what we actually mean is sort of like um uh it's not discrete, there's not a finite number of them, it could take on uh the things that we're interested in may be able to take on an infinitely uh large number of values. And maybe it's better to think of this as some examples than to try and sit down and write a hard definition for it. So what I mean here is essentially with this example, right, we can think of the points scored in a basketball uh game, we could think of that as being a
variable. We wish to predict we would use this in a regression problem to solve this sort of thing. But the points scored in a basketball game are not continuous in the sense of mathematics because you can't score every possible point between two pairs of points if that makes sense. So essentially like you can't get a fraction of a point in a basketball game uh which in mathematically, we would like to be able to do if we're gonna call something continuous.
Um Some example, regression problems uh we might be interested in might be predicting the price of gas. If we were given the time, the price of crude oil and temperature, say uh predicting the points scored by an NBA team in a game or maybe predicting how a student will score on a test if we were given prior test and prior test and quiz scores that that student has had.
Uh So I will also point out while continuous might have a slightly different definition. Here. We're also gonna think of regression as having a slightly different definition from the statistical notion of regression. Uh A key point here is that we're gonna use a lot of statistical notion like a lot of uh techniques from statistics that have been called regression we're going to use, but not every statistic statistical regression technique is going to be a supervised
learning regression algorithm. Uh One example will be if you've heard of logistic regression that's actually gonna fall under the header of classification even though statistically it is a regression technique. It is not what we think of in supervised learning as regression. So in supervised learning regression is uh building a model uh to predict or explain a data set in which the outcome is a continuous numeric value in contrast or by contrast classification is going to
be a series of problems in which we're gonna use data to predict an outcome that falls into one of many possible classes or categories. So, in a classification problem are why our output variable is going to be uh something that falls into a number of different categories, any finite number of categories. And then we're going to build a model to predict that to be able to predict what class an observation should fall into.
So some specific problems might be predicting whether an image contains uh a cat, whether somebody has cancer, uh whether somebody will default on a loan. Uh These are a couple of examples of binary classification problems. So two possible classes. Um Another very common multi class problem is predicting. So taking in uh an image of a handwritten digit, so like zero through nine uh and then predicting what act like what digit is that actually?
Or taking in an image of a handwritten letter and then predicting what of um of the 26 English letters that could be. So in this section, uh supervised learning, we're gonna cover a variety of techniques to solve such problems, regression classification. Uh Along the way, we'll cover data preparation procedures for such problems. So there's some preprocessing. We'll have to do a lot of the time. Uh We'll see a large variety of examples of problems with a couple of different
data sets. Some will be synthetic, others will be real. Uh And we'll discuss foundational data science and machine learning concepts. So why don't we go ahead and get started by actually talking about the modeling framework that we're going to be leaning on throughout the entirety of the supervised learning uh series of notebooks. OK. So as I said, that's it for this video. I hope to see you in the next video where we dive more into supervised learning. Bye.