The Candidate-elimination Algorithm
Introduction: The candidate elimination algorithm keeps two lists of hypotheses consistent with the training data:
(i) The list of most specific hypotheses S and,
(ii) The list of most general hypotheses G. This is enough to derive the whole version space VS.
Steps:
- Initialize G to the set of maximally general hypotheses in H
- Initialize S to the set of maximally specific hypotheses in H
- For each training example X do
- If X is positive: generalize S if necessary
- If X is negative: specialize G if necessary
4. Output {G,S}
Step (a) Positive examples
If X is positive:
- Remove from G any hypothesis inconsistent with X
- For each hypothesis h in S not consistent with X
- Remove h from S
- Add all minimal generalizations of h consistent with X such that some member of G is more general than h
- Remove from S any hypothesis more general than any other hypothesis in S
Step (b) Negative examples
If X is negative:
- Remove from S any hypothesis inconsistent with X
- For each hypothesis h in G not consistent with X
- Remove g from G
- Add all minimal generalizations of h consistent with X such that some member of S is more specific than h
- Remove from G any hypothesis less general than any other hypothesis in G
The candidate elimination algorithm is guaranteed to converge to the right hypothesis provided the following:
a) No errors exist in the examples
b) The target concept is included in the hypothesis space H
If there exists errors in the examples:
a) The right hypothesis would be inconsistent and thus eliminated.
b) If the S and G sets converge to an empty space we have evidence that the true concept lies outside space H.