Data Mining & Data Warehousing

Multi-relational Classification Using Tuple Id Propagation

Introduction: In this section we introduce Cross Mine, an approach that uses tuple ID propagation for multi-relational classification. To better integrate the information of ID propagation, Cross Mine uses complex predicates as elements of rules.

A complex predicate, p, contains two parts:

1.prop-path: This indicates how to propagate IDs. For example, the path “Loan: account_ID ->Account: account ID” indicates propagating IDs from Loan to Account using account_ID. If no ID propagation is involved, prop-path is empty.

2.constraint: This is a predicate indicating the constraint on the relation to which the IDs are propagated. It can be either categorical or numerical. op

A complex predicate is usually equivalent to two conventional predicates. For example, the rule “Loan(L; ) :-Loan(L; A;_;_;_; ), Account(A; ; frequent =monthly; )” can be represented by “Loan( ) :- [Loan.account_ID -> Account.account_ID, Account.frequency = monthly].”

Cross Mine builds a classifier containing a set of rules, each containing a list of complex predicates and a class label. The algorithm of Cross Mine is shown in Figure 9.23. Cross Mine is also a sequential covering algorithm like FOIL. It builds rules one at a time. After a rule r is built, all positive target tuples satisfying r are removed from the data

Algorithm: Cross Mine. Rule-based classification across multiple relations.

Input:

  • D, a relational database;
  • Rt a target relation.

Output:

A set of rules for predicting class labels of target tuples.

Method:

1.       rule set R <- 0;

2.       while (true)

3.       rule r <- empty-rule;

4.       set Rt to active;

5.       repeat

6.       Complex predicate p the predicate with highest foil gain;

7.       if foil gain(p) < MIN FOIL GAIN then

8.       break;

9.       else

10.   r r p; // append predicate, increasing rule length by 1

11.   remove all target tuples not satisfying r;

12.   update IDs on every active relation;

13.   if p.constraint is on an inactive relation then

14.   set that relation active;

15.   endif

16.   until (r:length = MAX RULE LENGTH)

17.   if r = empty-rule then break;

18.   R R∪{r};

19.   remove all positive target tuples satisfying r;

20.   set all relations inactive;

21.   endwhile

22.   return R;

set. To build a rule, Cross Mine repeatedly searches for the best complex predicate and appends it to the current rule, until the stop criterion is met. A relation is active if it appears in the current rule. Before searching for the next best predicate, each active relation is required to have the ID set of propagated IDs for each of its tuples. When searching for a predicate, Cross Mine evaluates all of the possible predicates on any active relation or any relation that is joinable with an active relation. When there are more than two classes of target tuples, Cross Mine builds a set of rules for each class.