Knowledge-level Debugging

Knowledge-Level Debugging: Just as in other software, there can be errors and omissions in knowledge bases. Domain experts and knowledge engineers must be able to debug a knowledge base and add knowledge. In knowledge-based systems, debugging is difficult because the domain experts and users who have the domain knowledge required to detect a bug do not necessarily know anything about the internal working of the system, nor do they want to. Standard debugging tools, such as providing traces of the execution, are useless because they require a knowledge of the mechanism by which the answer was produced. In this section, we show how the idea of semantics (page 159) can be exploited to provide powerful debugging facilities for knowledge-based systems. Whoever is debugging the system is required only to know the meaning of the symbols and whether specific atoms are true or not. This is the kind of knowledge that a domain expert and a user may have.

Knowledge-level debugging is the act of finding errors in knowledge bases with reference only to what the symbols mean. One of the goals of building knowledge-based systems that are usable by a range of domain experts is that a discussion about the correctness of a knowledge base should be a discussion about the knowledge domain. For example, debugging a medical knowledge base should be a question of medicine that medical experts, who are not experts in AI, can answer. Similarly, debugging a knowledge base about house wiring should be with reference to the particular house, not about the internals of the system reasoning with the knowledge base. Four types of non-syntactic errors can arise in rule-based systems:

An incorrect answer is produced; that is, some atom that is false in the intended interpretation was derived.
Some answer was not produced; that is, the proof failed when it should have succeeded (some particular true atom was not derived).
The program gets into an infinite loop.
The system asks irrelevant questions.

Ways to debug the first three types of error are examined below. Irrelevant questions can be investigated using the why questions as described earlier.

Incorrect Answers An incorrect answer is an answer that has been proved yet is false in the intended interpretation. It is also called a false-positive error. An incorrect answer can only be produced by a sound proof procedure if an incorrect definite clause was used in the proof.

Assume that whoever is debugging the knowledge base, such as a domain expert or a user, knows the intended interpretation of the symbols of the language and can determine whether a particular proposition is true or false in the intended interpretation. The person does not have to know how the answer was derived. To debug an incorrect answer, a domain expert needs only to answer yes-or-no questions.

Suppose there is an atom g that was proved yet is false in the intended interpretation. Then there must be a rule g ← a1 ∧ . . . ∧ ak in the knowledge base that was used to prove g. Either

one of the ai is false in the intended interpretation, in which case it can be debugged in the same way, or
all of the ai are true in the intended interpretation. In this case, the definite clause g ← a1 ∧ . . . ∧ ak must be incorrect.

Figure 5.6: An algorithm to debug incorrect answers

1: procedure Debug(g, KB)
2: Inputs
3: KB a knowledge base
4: g an atom: KB g and g is false in intended interpretation
5: Output
6: clause in KB that is false
7: Find definite clause g ← a1 ∧ . . . ∧ ak ∈ KB used to prove g
8: for each ai do
9: ask user whether ai is true
10: if user specifies ai is false then
11: return Debug(ai, KB)
12: return g ← a1 ∧ . . . ∧ ak

This leads to an algorithm, presented in Figure 5.6, to debug a knowledge base when an atom that is false in the intended interpretation is derived. This only requires the person debugging the knowledge base to be able to answer yesor- no questions.

This procedure can also be carried out by the use of the how command. Given a proof for g that is false in the intended interpretation, a user can ask how that atom was proved. This will return the definite clause that was used in the proof. If the clause was a rule, the user could use how to ask about an atom in the body that was false in the intended interpretation. This will return the rule that was used to prove that atom. The user can repeat this until a definite clause is found where all of the elements of the body are true (or there are no elements in the body). This is the incorrect definite clause. The method of debugging assumes that the user can determine whether an atom is true or false in the intended interpretation. The user does not have to know the proof procedure used.

The user or domain expert can find the buggy definite clause without having to know the internal workings of the system or how the proof was found. They only require knowledge about the intended interpretation and the disciplined use of how.

Missing Answers: The second type of error occurs when an expected answer is not produced. This manifests itself by a failure when an answer is expected. A goal g that is true in the domain, but is not a consequence of the knowledge base, is called a false-negative error.

The preceding algorithm does not work in this case. There is no proof. We must look for why there is no proof for g. An appropriate answer is not produced only if a definite clause or clauses are missing from the knowledge base. By knowing the intended interpretation of the symbols and by knowing what queries should succeed (i.e, what is true in the intended interpretation), a domain expert can debug a missing answer. Given an atom that failed when it should have succeeded, Figure 5.7 shows how to find an atom for which there is a missing definite clause. Suppose g is an atom that should have a proof, but which fails. Because the proof for g fails, the bodies of all of the definite clauses with g in the head fail.

Suppose one of these definite clauses for g should have resulted in a proof; this means all of the atoms in the body must be true in the intended interpretation. Because the body failed, there must be an atom in the body that fails. This atom is then true in the intended interpretation, but fails. So we can recursively debug it.
Otherwise, there is no definite clause applicable to proving g, so the user must add a definite clause for g.