Neural Network & Fuzzy Systems

Genetic Algorithms

3 min read
Introduction:-A typical example of a heuristic method for problem solving is the genetic approach used in what is known as genetic algorithms. Genetic algorithms solve complex combinatorial and organizational problems with many variants, by employing analogy with nature's evolution. Genetic algorithms were introduced by John Holland (1975) and further developed by him and other researchers. Nature’s diversity of species is tremendous. How does mankind evolve into the enormous variety of variants—in other words, how does nature solve the optimization problem of perfecting mankind? One answer to this question may be found in Charles Darwin's theory of evolution. The most important terms used in the genetic algorithms are analogous to the terms used to explain the evolutionary processes. They are: Genetic algorithms are usually illustrated by game problems. Such is a version of the "mastermind" game, in which one of two players thinks up a number (e.g., 001010) and the other has to find it out with a minimal number of questions. Each question is a hypothesis (solution) to which the first player replies With another number indicating the number of correctly guessed figures. This number is the criterion for the selection of the most promising or prospective variant which will take the second player to eventual success. If there is no improvement after a certain number of steps, this is a hint that a change should be introduced. Such change is called mutation. In this game success is achieved after 16 questions, which is four times faster than checking all the possible combinations, as there are 26 = 64 possible variants. There is no need for mutation in this example. If it were needed, it could be introduced by changing a bit (a gene) by random selection. Mutation would have been necessary if, for example, there was 0 in the third bit of all three initial individuals, because no matter how the most prospective individuals are combined, by copying a precise part of their code we can never change this bit into 1.Mutation takes evolution out of a "dead end."

Human Brain

3 min read
Introduction: -The human nervous system may be viewed as a three-stage system, as depicted in the block diagram. Central to the system is the brain, represented by the neural (nerve) net, which continually receives information, perceives it, and makes appropriate decisions. Those pointing from left to right indicate the forward transmission of information-bearing signals through the system. The arrows pointing from right to left signify the presence of feedback in the system. The receptors convert stimuli from the human body or the external environment into electrical impulses that convey information to the neural net (brain).The effectors convert electrical impulses generated by the neural net into discernible responses as system outputs. Synapses are elementary structural and functional units that mediate the interactions between neurons. The most common kind of synapse is a chemical synapse, which operates as follows. A presynaptic process liberates a transmitter substance that diffuses across the synaptic junction between neurons and then acts on a postsynaptic process. Thus a synapse converts a presynaptic electrical signal into a chemical signal and then back into postsynaptic electrical signal. In traditional descriptions of neural organization, it is assumed that a synapse is a simple connection that can impose excitation or inhibition, but not both on the receptive neuron. Axons, the transmission lines, and dendrites, the receptive zones, constitute two types of cell filaments that are distinguished on morphological grounds; an axon has a smoother surface, fewer branches, and greater length, whereas a dendrite has an irregular surface and more branches.

Expert Systems

2 min read
Introduction:-Expert systems are knowledge-based systems that contain expert knowledge. For example, an expert system for diagnosing car faults has a knowledge base containing rules for checking a car and finding faults in the same way an engineer would do it. An expert system is a program that can provide expertise for solving problems in a defined application area in the way the experts do. Expert systems have facilities for representing existing expert knowledge, accommodating existing databases, learning and accumulating knowledge during operation, learning new pieces of knowledge from existing databases, making logical inferences, making decisions and giving recommendations, communicating with users in a friendly way (often in a restricted natural language), and explaining their"behaviour" and decisions. The explanation feature often helps users to understand and trust the decisions made by an expert system. Learning in expert systems can be achieved by using machine-learning methods and artificial neural networks. Expert systems have been used successfully in almost every field of human activity, including engineering, science, medicine, agriculture, manufacturing, education and training, business and finance, and design. By using existing information technologies, expert systems for performing difficult and important tasks can be developed quickly, maintained cheaply, used effectively at many sites, improved easily, and refined during operation to accommodate new situations and facts.

Initial Configuration Of A Multilayer Perceptron

5 min read
Number of layers: Two or three may often do the job, but more are also used A network should have one layer of input neurons and one layer of output neurons, which results in at least two layers. During the examination of linear separability at least one hidden layer of neurons, if our problem is not linearly separable (which is, as we have seen, very likely).It is possible, to mathematically prove that this MLP with one hidden neuron layer is already capable of approximating arbitrary functions with any accuracy but it is necessary not only to discuss the representability of a problem by means of a perceptron but also the learnability. Representability means that a perceptron can, in principle, realize a mapping - but learnability means that we are also able to teach it. In this respect, experience shows that two hidden neuron layers (or three trainable weight layers) can be very useful to solve a problem, since many problems can be represented by a hidden layer but are very difficult to learn. One should keep in mind that any additional layer generates additional sub minima of the error function in which we can get stuck. All these things considered, a promising way is to try it with one hidden layer at first and if that fails, retry with two layers. Only if that fails, one should consider more layers. However, given the increasing calculation power of current computers, deep networks with a lot of layers are also used with success. The number of neurons has to be tested

Resilient Backpropagation

3 min read
Introduction: -Resilient backpropagation is an extension to backpropagation of error.On the one hand, users of backpropagation can choose a bad learning rate. On the other hand, the further the weights are from the output layer, the slower backpropagation learns. For this reason, MartinRiedmiller et al. enhanced backpropagation and called their version resilient backpropagation. Learning rates: Backpropagation uses by default a learning rate which is selectedby the user, and applies to the entire network.It remains static untilit is manually changed. Rprop pursues acompletely different approach: there is no global learning rate. First, each weight wi,jhas its own learning rateɳi,j, and second, these learning rates are not chosen by the user, but are automatically set by Rprop itself. Third, the weight changes are not static butare adapted for each time step of Rprop. Weight change: When using backpropagation, weights are changed proportionally to the gradient of the error function. At first glance, this is really intuitive. However, we incorporate every jagged feature of the error surface into the weight changes. It is at least questionable, whether this is always useful. Here, Rprop takes other way as well: the amount of weight change Δwi,jsimply directly corresponds to the automatically adjusted learning rate ɳi,j. Thus the change in weight isnot proportional to the gradient,it isonly influenced by the sign of the gradient. Until now we still do not know how exactly the ɳi,j are adapted atrun time, but let me anticipate that the resulting process looks considerably less rugged than an error function.

Training Recurrent Networks

4 min read
Introduction: -Jordan network without a hidden neuron layer for our training attempts so that the output neurons can directly provide input. This approach is a strong simplification because generally more complicated networks are used. But this does not change the learning principle. Unfolding in time:-The back propagation of error, which back propagates the delta values. So, in case of recurrent networks the delta values would back propagate cyclically through the network again and again, which makes the training more difficult. On the one hand we cannot know which of the many generated delta values for aweigh should be selected for training, i.e. which values are useful. On the other hand we cannot definitely know when learning should be stopped. The advantage of recurrent networks is great state dynamics within the network; the disadvantage of recurrent networks is that these dynamics are also granted to the training and therefore make it difficult. One learning approach would be the attempt to unfold the temporal states of the network: Recursions are deleted by putting a similar network above the context neurons, i.e. the context neurons are, as a manner of speaking, the output neurons of the attached network. More generally spoken, we have to backtrack the recurrences and place "‘earlier"’ instances of neurons in the network thus creating a larger, but forward-oriented network without recurrences. This enables training a recurrent network with any training strategy developed for non-recurrent ones. Here the input is entered as teaching input into every "copy" of the input neurons. This can be done for a discrete number of time steps. These training paradigms are calledunfolding in time.

Hierarchical Multimodular Network Architectures For Playing Games

2 min read
Introduction: -This approach is based on building different layers of neural networks that perform different consecutivetasks in the process of finding the best next move. For example, layer 1, an input layer, receives datafrom the board; layer 2, an intermediate layer of receptor neurons, recognizes patterns, features, and situations on the game board; and layer 3, an output layer of effector neurons, selects the best next move. The patterns on the board are detected, and appropriate neurons in the intermediate layer are activated accordingly. This requires prewired connections. The weights are preset and no training is required. A connectionist model of this type was presented in Rumelhart et al. (1986a). Having established the links between the neurons from different layers in the network, the system will "play," performing the algorithm. The representation of game strategies in the suggested connectionist model is carried into effect in the following way. The operator, creating or perfecting the network, defines the most important configurations (characteristics) generated in the game process. The presence or absence of one such characteristic determines the need for performing, or refraining from performing, certain moves. For every major characteristic of the game a neural element is constructed in the second layer of the network, sensitive to the appearance (or absence) of the respective configuration characteristic on the game board.

Using Som For Phoneme Recognition

3 min read
The feature vectors obtained after signal processing e.g. the Mel-scale cepstrum coefficients vectors, can be used as training examples for training a SOM. Vectors representing allophonic realizations of the same phoneme are taken from different windows (frames) over one signal sample and from different signals, that is, different realizations of this phoneme. After enough training, every phoneme is represented on the SOM by the activation of some output nodes. One node fires when an input vector representing a segment of the allophonic realization of this phoneme is fed into the network. The outputs, which react to the same phoneme pronounced differently, are positioned closely. The outputs that react to similar phonemes are positioned in proximity on the map. This is due to the ability of the SOM to activate topologically close output neurons when similar input vectors are presented. This approach has been used and phonemic maps have been created for Finnish, Japanese, English, Bulgarian (and other languages. Figure A is a two-dimensional drawing of the coordinates of the neurons in a SOM that was labeled with eight phonemes in English selected from digits pronounced by a small group of speakers. Figure B shows how allophonic segments of phonemes were segmented for training the SOM. It is clear from this drawing that the phonemes are well distinguished, and there are areas where the network cannot produce any meaningful classification. Instead of having a large, and therefore slow-to-process single SOM, hierarchical models of SOMs can be used. Every SOM at the second level is activated when a corresponding neuron from the first level becomes active. The asymptotic computational complexity of the recognition of the two-level hierarchical model is 0(2 × n × m) where n is the number of inputs and m is the size of a single SOM. This is much less than the computational complexity O(n · m2) of a single SOM with a size of m2 (m = 16). For a general r-level hierarchical model, the complexity is O(r · n · m). The first-level SOM is trained to recognize four classes of phonemes: a pause, a vocalized phoneme, a no vocalized phoneme, and a fricative segment.

Problem Identification And Choosing The Neural Network Model

2 min read
Before starting to develop a solution to a given problem, questions must be answered. What is the point of using a neural network for solving that specific problem? Why should a neural network be used? What are the benefits of using a network? The generic properties of different neural network types, and connectionist models in general, must be known to answer this question. What properties are going to be useful for solving the problem? Problem identification also includes analysis of the existing problem knowledge. The problem knowledge may contain data and rules. If data are available, the independent variables (the input variables) should be clearly distinguished from the dependent variables (the output variables) in order to choose a proper neural network architecture and a learning method. These variables can be discrete, continuous, linguistic, boolean, etc. In some cases, only input variables are present. In this case, unsupervised learning algorithms and appropriate neural networks can be used to cluster, to conceptualize, and to recognize patterns in the domain data. If rules are available in the problem knowledge, they can be implemented in an appropriate neural network structure. Especially suitable for connectionist implementation are fuzzy rules. If both rules and data are available, then hybrid systems, which incorporate both neural networks and symbolic AI techniques, can be used, or a neural network can be trained by using both data and rules, the latter being treated as hints or input-output associations.

Approximate Reasoning In Nps

4 min read
Introduction:-The process of approximate reasoning in an NPS can be described as matching facts represented by their truth degrees against antecedents of rules. Then the truth values (certainty degrees) of the new facts (conclusions) are computed. This is repeated as a chain of inferring new facts, matching the newly inferred facts (and the old facts of course) to the productions again, and so forth. The reasoning process is no monotonic; processing of new facts may decrease the certainty degree of an already inferred fact. The main idea of controlling the approximate reasoning in an NPS is that by tuning the inference control parameters we can adjust the reasoning process for a particular production system to the requirements of the experts. Approximate reasoning in an NPS is a consequence of its partial match. For example, by using the noise tolerance coefficients Qi, an NPS can separate facts that are relevant to the decision process from irrelevant facts. Rules with different sensitivity coefficients Pi react differently to the same set of relevant facts. An NPS can work with missing data. One rule may fire even when some facts are not known. By adjusting the degrees of importance DIij we declare that some condition elements are more important than others and rules can fire if only the important supporting facts are known. Adjustment of the inference control parameters facilitates the process of choosing an appropriate inference for a particular production system.

Hybrid Systems For Speech Recognition

4 min read
Description:-The speech recognition process is representable as a two-level hierarchical process, consisting of a low level sub words recognition, for example, phoneme recognition, and a high-level—words, sentences, contextual information recognition, language analysis; each of the levels being representable in a recursive manner as many other levels of processing. Different combinations of techniques for low-level and higher level processing uses the following techniques:- Template matching and dynamic time warping for low-level processing. Speech recognition using template-matching involves comparing an unclassified input pattern to a dictionary of patterns (templates) and deciding which pattern in the dictionary the input pattern is closest to. Distance measures are used to decide the best match. Before the matching is done it is necessary to perform some time alignment between the input pattern and each reference template. Owing to the variability of speech there will be local and global variations in the time scale of two spoken examples of the same word, regardless of whether the two examples were uttered by the same speaker or not. An effective technique utilized in computer speech recognition for time-aligning two patterns is a nonlinear time-normalizing technique, dynamic time warping. Speech recognition systems based on dynamic time warping have been used successfully for isolated word recognition. Usually the vocabulary is medium-sized or less (i.e., < 100 words) because the dictionary of reference templates takes up a lot of storage space. Dynamic time-warping systems are also usually speaker-dependent. For speaker-independent systems reference templates have to be collected from a large number of people; these are then clustered to form a representative pattern for each recognition unit (Owens 1993). Speech recognition systems utilizing dynamic time warping have also been used for connected speech recognition and recognizing strings of words such as a series of digits (e.g., a telephone number). A limitation of dynamic time warping, when the recognition units are words, is that its time-aligning Capabilities can lead to confusion between words when the principle distinguishing factor is the duration of a vowel, for example, "league" and "Leek.

The Neocognitron

4 min read
Introduction:- The neocognitron design evolved from an earlier model called the cognitron,and there are several versions of the neocognitron itself. The system was designed to recognize the numerals 0 through 9, regardless of where they are placed in the field of view of the retina. Moreover, the network has a high degree of tolerance to distortion of the character and is fairly insensitive to the size of the character. This first architecture contains only feed forward connections. Functional Description:-The PEs of the neocognitron are organized into modules that we shall refer to as levels. Each level consists of two layers: a layer of simple cells, or S-cells, followed by a layer of complex cells, or C-cells. Each layer, in turn, is divided into a number of planes, each of which consists of a rectangular array of PEs. On a given level, the S-layer and the C-layer may or may not have the same number of planes. All planes on a given layer will have the same number of PEs; however, the number of PEs on the S-planes can be different from the number of PEs on the C-planes at the same level. Moreover, the number of PEs per plane can vary from level to level. There are also PEs called Vs-cells and Vc-cells. Here We construct a complete network by combining an input layer, which we shall call the retina, with a number of levels in a hierarchical fashion. The interconnection strategy is unlike that of networks that are fully interconnectedbetween layers, such as the backpropagation network. Each layer of simple cells acts as a feature extraction system that uses the layer preceding it as its input layer. On the first S-layer, the cells on each plane are sensitive to simple features on the retina—in this case, line segments at different orientation angles. Each S-cell on a single plane is sensitive to the same feature, but at different locations on the input layer. S-cells on different planes respond to different features.

Back-propagation Network

3 min read
Introduction:-When problem knowledge includes explicit (fuzzy) rules, a connectionist system can be trained with them, as with input-output associations where the input patterns are the antecedent parts of the rules and the output patterns are the consequent parts. A fuzzy association A → B where A and B are fuzzy values, defined, for example, by their membership functions, can be memorized in an n-input, m-output neural network, where n is the cardinality of the universe Ux, m is the cardinality of the universe Uy, and x and y are fuzzy variables with corresponding fuzzy values A and B . This is the basis for using connectionist architectures for reasoning over fuzzy rules of the type IF x is A, THEN y is B. An MLP neural network can be trained with a set of fuzzy rules. The rules are treated as input-output training examples. When a new fuzzy set A' is supplied as a fuzzy input, the network will produce an output vector that is the desired fuzzy output B'. The generalized modus ponens law can be realized in a connectionist way. The following inference laws can also be satisfied by the same neural network, subject to a small error: A →B (modus ponens); Very A→Very B; More or less A →More or less B. A method for implementing multiantecedent fuzzy rules on a single neural network is introduced in Kosko (1992) and Kasabov (1993). The dimension of the input vector is the sum of the used discrete representation for the cardinality of the universes of all the fuzzy input variables. The dimension of the output vector is the sum of the corresponding discrete cardinality of all universes of the output fuzzy variables in the fuzzy rules. The fuzzy rules are assumed to have the same input and output variables but different combinations of their fuzzy values. If OR connectives are used in a rule, it may require that more training examples are generated based on combinations between antecedent parts of the rules for the Bank Loan Decision problem. The MLP consists of 33 input nodes, 11 intermediate nodes, and 11 output nodes. The inferred bank loan fuzzy decision values for the same three bank loan applicants as used in the example above, but here represented as fuzzy input values, are given as fuzzy membership functions in figure 5.8. The experiments show that the decisions inferred by the neural network for the three representative cases defined as fuzzy sets are correct. The ambiguity in the third solution vector clearly suggests 'not known decision' case.