Supervised Learning
Supervised Learning: An abstract definition of supervised learning as follows. Assume the learner is given the following data:
- a set of input features X1,...,Xn;
- a set of target features Y1,...,Yk;
- a set of training examples where the values for the input features and the target features are given for each example; and
- a set of test examples where only the values for the input features are given.
The aim is to predict the values of the target features for the test examples and as-yet-unseen examples. Typically, learning is the creation of a representation that can make predictions based on descriptions of the input features of new examples.
If e is an example, and F is a feature, let val(e,F) be the value of feature F in example e.
Ex. | Author | Thread | Length | Where Read | User Action |
e1 | known | new | long | home | skips |
e2 | unknown | new | short | work | reads |
e3 | unknown | follow Up | long | work | skips |
e4 | known | follow Up | long | home | skips |
e5 | known | new | short | home | reads |
e6 | known | follow Up | long | work | skips |
e7 | unknown | follow Up | short | work | skips |
e8 | unknown | new | short | work | reads |
e9 | known | follow Up | long | home | skips |
e10 | known | new | long | work | skips |
e11 | unknown | follow Up | short | home | skips |
e12 | known | new | long | work | skips |
e13 | known | follow Up | short | home | reads |
e14 | known | new | short | work | reads |
e15 | known | new | short | home | reads |
e16 | known | follow Up | short | work | reads |
e17 | known | new | short | home | reads |
e18 | unknown | new | short | work | reads |
e19 | unknown | new | long | work | ? |
e20 | unknown | follow Up | long | home | ? |
Figure 7.1: Examples of a user's preferences. These are some training and test examples obtained from observing a user deciding whether to read articles posted to a threaded discussion board depending on whether the author is known or not, whether the article started a new thread or was a follow-up, the length of the article, and whether it is read at home or at work. e1,...,e18 are the training examples. The aim is to make a prediction for the user action on e19, e20, and other, currently unseen, examples.
Example 7.1: Figure 7.1 shows training and test examples typical of a classification task. The aim is to predict whether a person reads an article posted to a bulletin board given properties of the article. The input features are Author, Thread, Length, and Where Read. There is one target feature, User Action. There are eighteen training examples, each of which has a value for all of the features.
In this data set, val(e11,Author)=unknown, val(e11,Thread)=follow Up, and val(e11,UserAction)=skips.
The aim is to predict the user action for a new example given its values for the input features.
The most common way to learn is to have a hypothesis space all possible representations. Each possible representation is a hypothesis The hypothesis space is typically a large finite, or countably infinite, space. A prediction is made using one of the following:
- the best hypothesis that can be found in the hypothesis space according to some measure of better,
- all of the hypotheses that are consistent with the training examples, or
- the posterior probability of the hypotheses given the evidence provided by the training examples.
One exception to this paradigm is in case-based reasoning, which uses the examples directly.