Defining A Network Topology
Introduction: Before training can begin, the user must decide on the network topology by specifying the number of units in the input layer, the number of hidden layers (if more than one), the number of units in each hidden layer, and the number of units in the output layer.
Normalizing the input values for each attribute measured in the training tuples will help speed up the learning phase. Typically, input values are normalized so as to fall between 0:0 and 1:0. Discrete-valued attributes may be encoded such that there is one input unit per domain value. For example, if an attribute A has three possible or known values, namely fa0, a1, a2g, then we may assign three input units to represent A. That is, we may have, say, I0, I1, I2 as input units. Each unit is initialized to 0. If A=a0, then I0 is set to 1. If A = a1, I1 is set to 1, and so on. Neural networks can be used for both classification (to predict the class label of a given tuple) and prediction (to predict a continuous-valued output). For classification, one output unit may be used to represent two classes (where the value 1 represents one class and the value 0 represents the other). If there are more than two classes, then one output unit per class is used.
Network design is a trial-and-error process and may affect the accuracy of the resulting trained network. The initial values of the weights may also affect the resulting accuracy. Once a network has been trained and its accuracy is not considered acceptable, it is common to repeat the training process with a different network topology or a different set of initial weights. Cross-validation techniques for accuracy estimation can be used to help decide when an acceptable network has been found. A number of automated techniques have been proposed that search for a “good” network structure. These typically use a hill-climbing approach that starts with an initial structure that is selectively modified.
Back propagation: Back propagation learns by iteratively processing a data set of training tuples, comparing the network’s prediction for each tuple with the actual known target value. The target value may be the known class label of the training tuple (for classification problems) or a continuous value (for prediction). For each training tuple, the weights are modified so as to minimize the mean squared error between the network’s prediction and the actual target value. These modifications are made in the “backwards” direction, that is, from the output layer, through each hidden layer down to the first hidden layer (hence the name back propagation). Although it is not guaranteed, in general the weights will eventually converge, and the learning process stops. The algorithm is summarized in Figure 6.16. The steps involved are expressed in terms of inputs, outputs, and errors, and may seem awkward if this is your first look at neural network learning. The steps are described below.
Algorithm: Back propagation. Neural network learning for classification or prediction, using the back propagation algorithm.
Input:
- D, a data set consisting of the training tuples and their associated target values;
- l, the learning rate;
- network, a multilayer feed-forward network.
Output: A trained neural network.
Method:
(1) Initialize all weights and biases in network;
(2) while terminating condition is not satisfied {
(3) for each training tuple X in D {
(4) // Propagate the inputs forward:
(5) for each input layer unit j {
(6) Oj = Ij; // output of an input unit is its actual input value
(7) for each hidden or output layer unit j {
(8) Ij = ΣiwijOi θj; //compute the net input of unit j with respect to the
previous layer, i
(9) Oj = 1/(1 e-1 j) ; } // compute the output of each unit j
(10)// Back propagate the errors:
(11)for each unit j in the output layer
(12)Err j = Oj(1-Oj)(Tj -θj); // compute the error
(13)for each unit j in the hidden layers, from the last to the first hidden layer
(14)Err j = Oj(1-Oj) Σk Errkwjk; // compute the error with respect to the next higher layer, k
(15)for each weight wi j in network f
(16)Δwi j = (l)Err jOi; // weight increment
(17)wi j = wi j Δwi j; g // weight update
(18)for each bias θj in network f
(19)Δθj = (l)Err j; // bias increment
(20)θj = θj Δθj; g // bias update}}