Data Mining & Data Warehousing

Multiway Array Aggregation For Full Cube Computation

Input:

Ø  input: the relation to aggregate;

Ø  dim: the starting dimension for this iteration.

 

Globals:

Ø  constant numDims: the total number of dimensions;

Ø  constant cardinality[numDims]: the cardinality of each dimension;

Ø  constant min sup: the minimum number of tuples in a partition in order for it to be output;

Ø  outputRec: the current output record;

Ø  dataCount[numDims]: stores the size of each partition. dataCount[i] is a list of integers of size

Ø  cardinality[i].

 

Output:Recursively output the iceberg cube cells satisfying the minimum support.

Method:

(1) Aggregate (input); // Scan input to compute measure, e.g., count. Place result in outputRec.

(2) if input.count() == 1 then // Optimization

       Write Ancestors (input[0], dim); return;

          endif

(3) write outputRec;

(4) for (d = dim; d < numDims; d ) do //Partition each dimension

(5) C = cardinality[d];

(6) Partition(input, d, C, dataCount[d]); //create C partitions of data for dimension d

(7) k = 0;

(8) for (i = 0; i <C; i ) do // for each partition (each value of dimension d)

(9) c = dataCount[d][i];

(10) if c >= min sup then // test the iceberg condition

(11) output Rec.dim[d] = input[k].dim[d];

(12) BUC(input[k : : :k c], d 1); // aggregate on next dimension

(13) endif

(14) k =c;

(15) endfor

(16) outputRec.dim[d] = all;

(17) endfor

the resulting total (line 3). (Line 2 is an optimization feature that is discussed later in our example.) For each dimension d (line 4), the input is partitioned on d (line 6). On return from Partition (), data Count contains the total number of tuples for each distinct value of dimension d. Each distinct value of d forms its own partition. Line 8 iterates through each partition. Line 10 tests the partition for minimum support. That is, if the number of tuples in the partition satisfies (i.e., is ) the minimum support, then the partition becomes the input relation for a recursive call made to BUC, which computes the iceberg cube on the partitions for dimensions d 1 to numDims (line 12). Note that for a full cube (i.e., where minimum support in the having clause is 1), the minimum support condition is always satisfied. Thus, the recursive call descends one level deeper into the lattice. Upon return from the recursive call, we continue with the next partition for d. After all the partitions have been processed, the entire process is repeated for each of the remaining dimensions.