Data Mining & Data Warehousing

Genetic Algorithms

Introduction:-A typical example of a heuristic method for problem solving is the genetic approach used in what is known as genetic algorithms. Genetic algorithms solve complex combinatorial and organizational problems with many variants, by employing analogy with nature's evolution. Genetic algorithms were introduced by John Holland (1975) and further developed by him and other researchers. Nature’s diversity of species is tremendous. How does mankind evolve into the enormous variety of variants—in other words, how does nature solve the optimization problem of perfecting mankind? One answer to this question may be found in Charles Darwin's theory of evolution. The most important terms used in the genetic algorithms are analogous to the terms used to explain the evolutionary processes. They are:

  • Gene—a basic unit, which controls a property of an individual.
  • Chromosome—a string of genes; it is used to represent an individual, or a possible solution of a problem in the solution space.
  • Population—a collection of individuals.
  • Crossover (mating) operation—substrings of different individuals are taken and new strings(offsprings) are produced.
  • Mutation—random change of a gene in a chromosome.
  • Fitness (goodness) function—a criterion which evaluates each individual.
  •  Selection—a procedure for choosing a part of the population that will continue the process ofsearching for the best solution, while the other part of the population "dies".

Genetic algorithms are usually illustrated by game problems. Such is a version of the "mastermind" game, in which one of two players thinks up a number (e.g., 001010) and the other has to find it out with a minimal number of questions. Each question is a hypothesis (solution) to which the first player replies

With another number indicating the number of correctly guessed figures. This number is the criterion for the selection of the most promising or prospective variant which will take the second player to eventual success. If there is no improvement after a certain number of steps, this is a hint that a change should be introduced. Such change is called mutation. In this game success is achieved after 16 questions, which is four times faster than checking all the possible combinations, as there are 26 = 64 possible variants. There is no need for mutation in this example. If it were needed, it could be introduced by changing a bit (a gene) by random selection. Mutation would have been necessary if, for example, there was 0 in the third bit of all three initial individuals, because no matter how the most prospective individuals are combined, by copying a precise part of their code we can never change this bit into 1.Mutation takes evolution out of a "dead end."

Four parameters are important for any genetic algorithm:

  1. The encoding scheme, that is, how to encode the problem in terms of genetic algorithms—what to choose for genes, how to construct the chromosomes, etc.
     
  2. The population size—how many possible solutions should be kept for further development
     
  3. The crossover operations—how to combine old individuals and produce new, more prospective ones
     
  4. The mutation heuristic—"when" and "how" to apply mutation.