Supervised Learning

Supervised Learning: An abstract definition of supervised learning as follows. Assume the learner is given the following data:

a set of input features X1,...,Xn;
a set of target features Y1,...,Yk;
a set of training examples where the values for the input features and the target features are given for each example; and
a set of test examples where only the values for the input features are given.

The aim is to predict the values of the target features for the test examples and as-yet-unseen examples. Typically, learning is the creation of a representation that can make predictions based on descriptions of the input features of new examples.

If e is an example, and F is a feature, let val(e,F) be the value of feature F in example e.

Ex.	Author	Thread	Length	Where Read	User Action
e1	known	new	long	home	skips
e2	unknown	new	short	work	reads
e3	unknown	follow Up	long	work	skips
e4	known	follow Up	long	home	skips
e5	known	new	short	home	reads
e6	known	follow Up	long	work	skips
e7	unknown	follow Up	short	work	skips
e8	unknown	new	short	work	reads
e9	known	follow Up	long	home	skips
e10	known	new	long	work	skips
e11	unknown	follow Up	short	home	skips
e12	known	new	long	work	skips
e13	known	follow Up	short	home	reads
e14	known	new	short	work	reads
e15	known	new	short	home	reads
e16	known	follow Up	short	work	reads
e17	known	new	short	home	reads
e18	unknown	new	short	work	reads
e19	unknown	new	long	work	?
e20	unknown	follow Up	long	home	?

Figure 7.1: Examples of a user's preferences. These are some training and test examples obtained from observing a user deciding whether to read articles posted to a threaded discussion board depending on whether the author is known or not, whether the article started a new thread or was a follow-up, the length of the article, and whether it is read at home or at work. e1,...,e18 are the training examples. The aim is to make a prediction for the user action on e19, e20, and other, currently unseen, examples.

Example 7.1: Figure 7.1 shows training and test examples typical of a classification task. The aim is to predict whether a person reads an article posted to a bulletin board given properties of the article. The input features are Author, Thread, Length, and Where Read. There is one target feature, User Action. There are eighteen training examples, each of which has a value for all of the features.

In this data set, val(e11,Author)=unknown, val(e11,Thread)=follow Up, and val(e11,UserAction)=skips.

The aim is to predict the user action for a new example given its values for the input features.

The most common way to learn is to have a hypothesis space all possible representations. Each possible representation is a hypothesis The hypothesis space is typically a large finite, or countably infinite, space. A prediction is made using one of the following:

the best hypothesis that can be found in the hypothesis space according to some measure of better,
all of the hypotheses that are consistent with the training examples, or
the posterior probability of the hypotheses given the evidence provided by the training examples.

One exception to this paradigm is in case-based reasoning, which uses the examples directly.