The Candidate-elimination Algorithm

Introduction: The candidate elimination algorithm keeps two lists of hypotheses consistent with the training data:

(i) The list of most specific hypotheses S and,

(ii) The list of most general hypotheses G. This is enough to derive the whole version space VS.

Steps:

Initialize G to the set of maximally general hypotheses in H
Initialize S to the set of maximally specific hypotheses in H
For each training example X do

If X is positive: generalize S if necessary
If X is negative: specialize G if necessary

4. Output {G,S}

Step (a) Positive examples

If X is positive:

Remove from G any hypothesis inconsistent with X
For each hypothesis h in S not consistent with X
Remove h from S
Add all minimal generalizations of h consistent with X such that some member of G is more general than h
Remove from S any hypothesis more general than any other hypothesis in S

Step (b) Negative examples

If X is negative:

Remove from S any hypothesis inconsistent with X
For each hypothesis h in G not consistent with X
Remove g from G
Add all minimal generalizations of h consistent with X such that some member of S is more specific than h
Remove from G any hypothesis less general than any other hypothesis in G

The candidate elimination algorithm is guaranteed to converge to the right hypothesis provided the following:
a) No errors exist in the examples
b) The target concept is included in the hypothesis space H

If there exists errors in the examples:
a) The right hypothesis would be inconsistent and thus eliminated.
b) If the S and G sets converge to an empty space we have evidence that the true concept lies outside space H.