Knowledge Redundancy Approach to Reduce Size in Association Rules

,


Introduction
Mining for association rules has been one of the most studied fields in data mining. Its main goal is to find unknown relations among items in a database.
Given a set of items I which contains all the items in the domain and a transactional database D where every transaction is composed by a transaction id (tid) and a set of items, subset of I (itemset).
An association rule is presented as an implication X → Y where X is the antecedent and Y is the consequent of the rule. Both X and Y are itemsets and usually, but not necessarily, they check X ∩ Y = ∅ property. Association rules reflect how much the presence of the rule antecedent influences the presence of the rule consequent in the database records.
What generally makes a rule meaningful are two statistical factors: support and confidence. The support of a rule supp(X → Y ) refers to the portion of the database transaction for which X ∩ Y is true while confidence conf (X → Y ) is a measure of certainty to evaluate the validity of the rule, it is a measure for the portion of record which contains Y from those that contain X. The problem with association rule mining deals with finding all the rules that satisfy a user-given threshold for support and confidence. Most algorithms face the challenge in a two steps procedure 1. Find all the itemsets which support value is equal or greater than the support threshold.
2. Generate all association rules X → (Y − X), considering: Y is a frequent itemset, X ⊂ Y , and conf (X → Y ) is equal or greater than the confidence threshold value.
The discovering of meaningful association rules can help in the decision-making process but the quite large number of rules usually makes it difficult for decision-makers in order to process, interpret and apply them. A significant part of the rules presented to the user are irrelevant because they are obvious, too general, too specific or because they are not relevant for the decision topic. Several methods were proposed in the literature to overcome this handicap such as interest measures development, concise representations of frequent itemsets and redundancy reduction. Section 2 discusses some of the most important works in the field. This paper proposes a new approach to deal with redundancy, taking into account user previous knowledge about the studied domain. Previous knowledge is used to detect and prune redundant rules. We adapt the concept of redundancy and we propose a procedure to develop the redundancy reduction process in the post-processing stage.
The paper is organized as follows. Section 2 discusses related work. In section 3 we propose an algorithm to find and prune redundant rules. In section 4 the proposed algorithm is used over three datasets one with data about financial investment [1], other with data about the USA census [2] and the other with data about Mushrooms [2]. Section 5 closes the paper with conclusions.

Related work
Interestingness is difficult to define quantitatively [3] but most interestingness measures are classified in objective measures and subjective measures. Objective measures are domain-independent, one of them is the interestingness which is expressed in terms of statistic or information theory applied over the database. Several surveys [4,5,6] summarize and compare objective measures. The explosion of objective measures has raised a new problem: What are the best metrics to use in a specific situation and a particular application field? Several papers attempt to solve it [8,9] but it is far from being solved. The correlation between 11 objective rule interestingness measures and real human interest over eight different datasets were computed in [10] and there was not a clear "winner", the correlation values associated with each measure varied considerably across the eight datasets.
Subjective measures were proposed in order to involve explicitly user knowledge in the selection of interesting rules so that the user can make a better selection. According to [11] subjective measures are classified in: -Unexpectedness: a pattern is interesting if it is surprising to the user.
-Actionability: a pattern is interesting if it can help the user to take some actions.
Actionability started as an abstract notion, with an unclear definition, but nowadays, several researchers are interested in it. The actionability problem is discussed in [12].
Unexpectedness or novelty [13] was proposed in order to solve the pattern triviality problem, assessing the surprise level of the discovered rules. Several techniques have been used to accomplish this aim: -Templates: Templates are syntactic constraints that allow the user to define a group of rules that are interesting or not to him/her [14,15]. A template is defined where A i is a class name in a hierarchy or an expression E over a class name. Templates may be inclusive or restrictive. A rule is considered interesting if it matches an inclusive template and uninteresting if it matches a restrictive template. The use of templates is quite restrictive because the matching method requires each rule element to be an instance of the elements in templates, and all template elements must have at least one instance in the rule. Moreover, the template definition makes hard to use it for declaring restrictive templates because it should be composed of elements subsuming all attributes of the rule, being in a subsuming relation with the inclusive template elements.
The best known form of templates is meta-rules [16,40] a meta-rule is the relationship between two association rules. The main drawback of this approach is that meta-rules are restricted to having a single rule in their antecedent and consequent, because of this some important information may be lost.
-Belief: Silbershatz and Tuzilin [11] defined user knowledge as a set of convictions, denominated belief. They are used in order to measure the unexpectedness of a pattern. Each belief is defined as a predicate formula expressed in first-order logic with a degree of confidence associated, measuring how much the user trusts in the belief. Two types of belief were defined: -Soft belief is that knowledge user accepts to change if new evidence contradicts the previous one. The interestingness of the new pattern is computed by how the new pattern changes the degree of beliefs.
-Hard belief is that knowledge user will not change whatever new patterns are extracted. They are constraints that cannot be changed with new evidence.
This approach is still in a development stage, no further advances were published, so it is not functional.
-General Impressions: were presented in [17] and later developed in [18] and [19]. They developed a specification language to express expectations and goals. Three levels of specification were established: General Impressions, Reasonably Precise Concept and Precise Knowledge. Item taxonomies concept was integrated in the specification languages in order to generalize rule selection. The matching process involved a syntactic comparison between antecedent/consequent elements. Thus, each element in the general impression should find a correspondent in the association rule.
-Logical Contradiction: was developed in [20]. It consists in extracting only those patterns which logically contradict the consequent of the corresponding belief.
An association rule X → Y is unexpected with respect to some belief A → B if: -Y ∧ B |= F ALSE B and Y are in logical contradiction; -X ∧ B has an important support in the database. This condition eliminates those rules which could be considered unexpected, but not those concerning the same transaction in the database; -A, X → B exists.
-Preference Model: was proposed in [21]. It is a specific type of user knowledge representing how the basic knowledge of the user, called knowledge rules (K), will be applied over a given scenario or tuples of the database. The user proposes a covering knowledge (Ct) for each tuple (t) -a subset of the knowledge rule set K that the user prefers to apply to the tuple t. The approach validates the transactions which satisfy the extracted rule.
All the previously presented works use some kind of knowledge to reduce the number of useless association rules in the final set. In this way, our approach is similar to them but there are some remarkable differences.
Like in templates our approach uses the syntactical notation of association rules to represent knowledge. Templates use this knowledge to constraint the structure of selected rules, pruning out those rules which do not satisfy the template but produce a lot of association rules with similar information. On the other hand, we use the knowledge to remove those rules with similar information, presenting to the user a set of unexpected rules that can help him to better understand the underlying domain.
The approach followed by Belief tries to find just unknown rules, this is our main goal too but, they use a complex and fixed formal knowledge representation based on first order logic and degrees of belief with no clear way of building and maintaining the belief system. Instead, we use a simpler and natural rule-based form of knowledge, focused on the enhanced capability to increase interactively the knowledge system.

Rule redundancy reduction
Research community accepts the semantical definition of association rule redundancy given in [22] "an association rule is redundant if it conveys the same information -or less general information -than the information conveyed by another rule of the same usefulness and the same relevance". But several formal definitions have been proposed over time. In table 1, a sample transactional database is presented. Defining a support threshold of 0.15 and a confidence threshold of 0.75, an association rule model with 92 rules is obtained. It is used to show redundancy definitions.  [22]: An association rule R : X → Y is a minimal non-redundant association rule if there is not an association rule R 1 : From data on  [23] tries to produce the set of minimal generators for each itemset. The number of closed association rules is linear to the number of closed frequent itemsets. It can be large for sparse and large datasets.
The Generic Basis (GB) and the Informative Basis (IB) [22] used the Galois connections to propose two condensed basics that represent non-redundant rules. The Gen-GB and Gen-RI algorithms were presented to obtain a generic basis and a transitive reduction of the IB. The reduction ratio of IB was improved by [24] maximal closed itemsets. The Informative Generic Basis [25] also uses the Galois connection semantics but taking the support of all frequent itemsets as an entry, so it can calculate the support and confidence of derived rules. The augmented Iceberg Galois lattice was used to construct the Minimal Generic Basis (MGB) [26]. The concept of generator was incorporated into high utility itemsets mining in [27].
The redundancy definition presented in definition 1 requires that a redundant rule and its corresponding non-redundant rule must have identical confidence and identical support. From data on : [yes]} supp = 0.41, conf = 1.0 those rules are non-redundant ones, but the consequent of R can be obtained from R 1 a rule with the same confidence and fewer conditions. So without R the same results are achieved, rule R must be a redundant rule. Xu [28] formalizes this kind of redundancy in definition 2.
Definition 2. Redundant rules [28]: Let X → Y and X 1 → Y 1 be two association rules with confidence cf and cf 1 , respectively. X → Y is said to be a redundant rule to Based on definition 2 the Reliable basis was proposed. It consists of two bases the ReliableApprox used in partial rules, and ReliableExact used in exact rules. Frequent closed itemsets are used to perform the reliable redundancy reduction process. It generates rules with minimal antecedent and maximal consequent. The reliable basis removes a great amount of redundancy without reducing the inference capacity of the remaining rules. Phan [29] uses a more radical approach to define redundancy see definition 3.
X → Y is said to be a representative association rule if there is not other interesting rule The redundancy definitions presented above do not guarantee the exclusion of all non-interesting patterns of the final model. Example 1 shows a group of rules with no new information to the user, and they are not classified as redundant by the previous definitions.

Example 1. A set of redundant rules from data in table 1
Let's see a subset of association rules obtained from table 1: [yes] in R 3 consequent provides no new information, because this is known by R 1 . So rule R 3 is redundant but this kind of redundancy is not detected by the previous definitions. Analyzing rules R 1 , R 2 and R 4 we can check that combining, transitively, of R 1 and R 2 it will produce R 4 so, R 4 is redundant. One more time this kind of redundancy is not detected by previous definitions. In R 5 ,R 6 and R 7 antecedent the item [loan].
[yes] provides no new information because this is known by R 1 . It is redundant and must be pruned, but it can not be detected by redundancy definitions.

Post-processing
Since the year 2000, the interest in post-processing methods in association rules has been increasing. Perhaps the most accurate definition of post-processing tasks were done by Baesens et al. [30] Post-processing consists of different techniques that can be used independently or together: pruning, summarizing, grouping and visualization. We have a special interest in pruning techniques that prune those rules that do not match to the user knowledge. Those techniques are associated with interestingness measures that may not satisfy the downward closure property, so it is impossible to integrate them in Apriori like extraction algorithms.
An element to consider is the nature of Knowledge Discovery in Databases (KDD) as an interactive and iterative user-centered process. Enforcing constraints during the mining runs neglects the character of KDD [31], [32]. A single and possibly expensive mining run is accepted but all subsequent mining questions are supposed to be satisfied with the initial result set.
In this work, a method is developed to obtain nonredundant association rules about user knowledge. It is important to ensure the user capability to refine his/her knowledge in an interactive and iterative way, accepting any of the discovered associations or discarding some previous associations and updating prior knowledge. This approach also makes possible to fulfill the mining question of different users, with different domain knowledge, in a single mining run.

A knowledge guided approach 3.1 Knowledge based redundancy
In example 1, a group of redundant rules, which are currently not covered by the definitions of redundancy, are showed. Our interest is to eliminate these forms of redundancy in association rule models. Based on a core set of rules that represent the user belief; a result of his experience working in the subject area. This knowledge is more general than rules obtained in the mining process which only represent a particular dataset with partial information so the quality metric value for this kind of rule is considered maximal. This set of rules will be named prior knowledge. A rule that does not contradict prior knowledge of the user will be considered redundant. We formalize the notion of prior knowledge redundancy in definition 4. User can represent previous knowledge in different ways like semantic networks, ontologies, among others.
Considering that, the expert is interested in association rules discovering, prior knowledge is incorporated to the model using association rules format. For example an expert working with the dataset presented in Definition 4. Knowledge Based Redundancy: Let S be a set of association rules and S c a set of prior known rules, defined over the same domain of S. An association rule R : X → Y ∈ S is redundant with respect to S c if there is a rule R : X → Y ∈ S c and fulfills some of the following conditions.
contains part or the whole antecedent and there is a third rule R in S c that shares information with R and its consequent contains R consequent.
A rule is redundant if its antecedent contains a part or the whole information of a previously known rule.
A rule is redundant if its consequent contains a part or the whole information of a previously known rule.
Reviewing rules in example 1 with definition 4 we have: Armstrong's axioms [33] are a set of inference rules. They allow to obtain the minimum set of functional dependencies that are maintained in a database. The rest of functional dependencies can be derived from this set. They are part of clear mechanisms designed to find smaller subsets of a larger set of functional dependencies called "covers" that are equivalent to the "bases" in Closure Spaces and Data Mining.
Armstrong's axioms can not be used as an inference mechanism in association rules [34] because it is impossible to obtain the values of support and confidence in the derived rules: -Transitivity if A → B and B → C both hold with confidence ≥ threshold we can not know the value for conf (AD → C) so the Transitivity does not hold.
-Augmentation (if A → B then AC → B) does not hold. Enlarging the antecedent of a rule may give a rule with much smaller confidence, even zero: think of a case where most of the times X appears it comes with Z, but it only comes with Y when Z is not present; then the confidence of X → Z may be high whereas the confidence of XY → Z may be null.
Our intention is to use Armstrong's axioms in order to assess if a rule has Prior Knowledge Redundancy over a set of rules S c from previous knowledge. So they must verify the condition presented in definition 4.
Condition X ⊆ X ∧ Y ∩ Y = {∅} represents the classical definition of redundancy like in definition 1, definition 2 and definition 3. This condition is fulfilled if a single attribute in Y is redundant. Armstrong's axioms can be used to perform this operation. Let R 1 : X → Y and R 2 : X → Y be association rules. Suppose Y ∩ Y = Y 1 . Then by the reflexivity axiom on R 2 consequent R 3 : Y → Y 1 and by reflexivity on R 1 consequent R 4 : Y → Y 1 . By transitivity between R 1 and R 3 we have R 5 : X → Y 1 , applying transitivity between R 2 and R 4 we have R 6 : X → Y 1 . X ⊆ X by statement condition, applying augmentation in R 6 until X = X, R 7 : X → Y 1 . Therefore Armstrong represents the notion of transitivity a common term in human thinking. This condition is fulfilled if a single attribute in Y is redundant. Let Then by the reflexitivity axiom on R 1 consequent R 4 : Y → Y 1 by transitivity between R 1 and R 4 we have R 5 : X → Y 1 . By statement condition X ⊆ Y so by reflexivity on R 2 consequent we have R 6 : Y → X . By transitivity between R 2 and R 6 we have R 7 : X → X now by transitivity between R 2 and R 7 we have R 8 : X → Y . Applying augmentation in R 8 until we have R 9 : X → Y . By reflexivity in R 9 consequent R 10 : Y → Y 1 and by transitivity between R 9 and R 10 we have R 11 : X → Y 1 . Therefore Armstrong Condition X ⊆ X ∧ Y ∩ X = {∅} represents the case when any item in the antecedent of a rule is a redundant one. Let R 1 : X → Y and R 2 : X → Y be rules. Suppose Y ∩ X = X 1 . Then by augmentation of X 1 in R 2 we have R 3 : X X 1 → X 1 Y and by transitivity between R 3 and R 1 R 4 : X → Y . Therefore Armstrong Condition X ⊆ Y ∧ Y ∩ Y = {∅} represents the case when any item in the consequent of R is redundant with respect to other item in consequent. This condition is fulfilled if a single attribute in Y is redundant. Let R 1 : X → Y and R 2 : X → Y be rules. Suppose Y ∩ Y = Y 1 . Then by the reflexivity axiom on R 2 consequent R 3 : Y → Y 1 by transitivity between R 2 and R 3 we have R 4 : X → Y 1 . By statement condition we have X ⊆ Y so by transitivity between R 1 and R 4 we have R 5 : X → Y 1 . Therefore Armstrong We do not use Armstrong's Axioms as an inference mechanism so, we do not worry if it is not able to ensure the support and confidence threshold in the inferred rules.

Algorithm to eliminate prior knowledge redundancy in association rules
In this section we present an algorithm to determine if a rule contains redundant items, see Fig. 1. The closure algorithm presented in [35] is used to compute X + .
Require: Set of previous knowledge rules S c A rule R i in form X → Y Ensure: Boolean value to indicate if the rule is redundant 1:  To determine the redundancy of a rule X → Y we have to prove if any item A in the rule's antecedent is redundant or if an item W in the consequent is redundant. The item A is redundant if the consequent can be derived from the prior knowledge without A. The first part of algorithm 1 performs this task for all items A ∈ X by calculating the closure of the new antecedent X − {A} over the previous knowledge rules joined to the studied rule focus, and comparing results with the closure of the same antecedent over the set of previous rules joined to a new rule, where the item A is not a part of the antecedent. If both results are equal, then the item A is redundant and the entire rule is also redundant. To test if item W is redundant we have to apply a similar procedure, the second part of algorithm 1 performs this task.

Correctness
We first prove that closure algorithm [35] can be used to detect redundancy according to definition 4. Closure algorithm applies Armstrong's axioms to find all items implied by a given itemset. Theorem 1. Let S c be a set of prior known rules and R : X → Y an association rule. If there is a rule R : Sc by assumption X ⊆ X and reflexivity axiom. So Y ∈ X + Sc by transitivity between X → X and X → Y . Therefore Y ∩ Y ∈ X + Sc by definition of set intersection.
Theorem 2. Let S c be a set of prior known rules and R : X → Y one association rule. If there is a rule R : X → Y ∈ S c and X ⊆ X∧ Then X ∈ X + Sc by assumption X ⊆ X and reflexivity axiom. Y ∈ X + Sc by transitivity between X → X and X → Y . X ∈ X + Sc by assumption X ⊆ Y and subset definition. So Y ∈ X + Sc by transitivity between X → X and X → Y . Therefore Y ∈ X + Sc by assumption Y ⊆ Y and subset definition.
Theorem 3. Let S c be a set of prior known rules and R : Proof. Assume X ⊆ X∧Y ∩X = {∅}. Then X ∈ (X− (Y ∩X)) + Sc by assumption X ⊆ X and reflexivity axiom. Y ∈ (X − (Y ∩ X)) + Sc by transitivity between X → X and X → Y . Therefore Y ∩ X ∈ (X − (Y ∩ X)) + Sc by definition of set intersection.
Theorem 4. Let S c be a set of prior known rules and R : X → Y one association rule. If there is a rule R : by transitivity between X → X and X → Y . Therefore Y ∩ Y ∈ X + Sc∪X→(Y −(Y ∩Y )) by definition of set intersection.
Hoare triple was introduced by C. A. R. Hoare [38] as {P }C{Q}, for specifying what a program does. In such a Hoare triple: -C is a program.
-P and Q are assertions, conditions on the program variables used in C. They will be written using standard mathematical notation together with logical operators. We can use functions and predicates to express high-level properties based on a domain theory [39] covering specifics of the application area.
We say {P }C{Q} is true, if whenever C is executed in a state satisfying {P } and if the execution of C finishes, then the state in which C execution finishes satisfies Q. If there is a loop in C, loop invariants must be used to prove correctness. If loop invariants are proved to be true after each loop iteration then the postcondiction must be proven true.
In algorithm 1 lines one through eight and lines nine through sixteen perform basically the same operation, one over the rule antecedent and the other over the rule consequent. So we analize them only one time. Line four checks if Y [i] is subset of the closure. So closure algorithm must be computed, this algorithm has been proved as correct [35]. The search of Y [i] within closure can be done by a well known linear search algorithm, we assume it is correct. Precondictions: -S c is a set of previous knowledge rules.
-X → Y is an association rule with X = X 1 , .., X n and Y = Y 1 , .., Y m Loop invariants: If the loop is executed j or more times, then after j executions Proving the loop invariant: (by induction on j) Base Case: j = 0 before first execution of loop i = 0 Inductive hypothesis: assume that, if the loop iterates j times then the loop invariant holds i old = j. Proving that if the loop iterates j + 1 times, then the loop invariant holds for i new = j + 1. If true for iteration i old = j then and i new = i old + 1. - because loop iterated for i old = j we have i old < n and i new ≤ n Thus, the loop invariant holds for j + 1.

When the loop test fails, the loop invariant holds and either
for 0 ≤ h < n, so no element in cosequent is a redundant one.
and true is returned Conclusions: Poscondition is satisfied in either case, so the algorithm is correct.

Complexity analysis
Time complexity of an algorithm is a function T (n) limiting the maximum number of steps in the algorithm for an input size n. T (n) depends on what is counted as one computation step, the random access machine (RAM) model is the most extended one. RAM is a model for a simple digital computer with random access memory. For the sake of simplicity T (n) is approximated by a simplest function, it is written T (n) = O(f (n)) if there are constants c ≥ 0 and n 1 ≥ 0 such that: T (n) ≤ cf (n) for all n ≥ n 1 .
For algorithm in Fig 1 we considered a as the number of different attribute symbols in S c and p the number of previous knowledge rules presented in S c . The complexity order to compute the closure is O(n) see [35]. The execution time of the first while loop (the consequent of the rule) takes a * p since the number of rules in F is p, and we compute the closure with a cost of O(p). The execution time of the second while loop (the antecedent of the rule) takes the same value of a * p because it performs the same operation and in the same way the complexity of the steps is O(ap). To compute the complexity of the entire algorithm, the complexity of the first and second while loops must be added so it is O(ap) + O(ap) = 2O(ap) but the constant 2 can be ignored and the final value for complexity of the algorithm is O(ap).
Association rules extraction algorithms have much higher complexity [36] than the reduction approach presented here. This difference led us to propose a reduction mechanism in which rule extraction algorithm is executed once and then, in the post processing stage, the reduction algorithm is fired to prune the redundant rules, rather than applying prior knowledge as restriction within the extraction algorithms, which would force to execute it for each different user and even for each change on a user's prior knowledge. The computational cost for the constraint approach is very high. However, our approach, in post processing stage, allows us to run a simpler routine when the user changes or the user prior knowledge is updated. The temporal cost of this approach did not exceed 5 seconds in any of the applied tests.

Methodology
In order to verify the effectiveness of our approach we performed experiments with four datasets. The first one with data about USA census [2], the second one with data about stock market investments [1], the third one with data about hypothetical samples of mushroom [2] and the last one with data about breast cancer [2]. Prior knowledge consists of 6 rules for each dataset. We use Pruning Ratio metric P R = (P runedRules/T otalRules) × 100 to evaluate our results. Table 2 shows the result of the experiments. Each row corresponds to an experiment following the next steps: 1. Find the complete set of rules using as support threshold the value in column 2 and confidence threshold the value in column 3. The number of rules is showed in column 4.
2. Apply the steps presented in algorithm 1. The number of pruned rules are presented in column 5 of Table 2. 3. After applying the algorithm to the dataset, the final number of rules is presented in column 6 of Table 2 while column 7 contains the pruning ratio. The execution time is presented in column 8.

Results and discussion
Pruning Ratio changes according to support in Census and Stocks datasets, first increasing while the support increases, but when the support is greater than 0.07 for the Census dataset and greater than 0.5 for Stocks dataset, the Pruning Ratio decreases while the support increases. The behavior in Mushroom dataset is the opposite, the Pruning Ratio decreases while support increases until the support reaches the 0.5 value then the Pruning Ratio increases while the support value increases. This behavior shows a relation between support and previous knowledge patterns. If the support is increased, then a number of rules do not meet the support threshold and they are discarded. Hence the discarded rules have no major impact on the rules derived from previous knowledge, Pruning Ratio will be increased, but as the support increases it starts to reduce the rules derived from previous knowledge, so the Pruning Ratio will be decreased.
In Fig 1, Fig 2 and Fig 3 the

Traditional vs. knowledge based reduction
The approach developed in this paper differs from those published until now. Previous woks are concerned with the structural relationship between association rules and mechanisms to reduce redundancy using inference rules and maximal itemsets. We use the user experience to prune rules that do not bring new knowledge to the user, simplifying decision making. Both approaches are not comparable in essence, but we carried out experiments to compare KBR's pruning ratio with previous works. Fig 4 shows the pruning ratio of some relevant works in redundancy reduction, over a Mushroom dataset with a support value of 0.3. We used Mushroom dataset because we can access to author experiments and it is sufficient to test our case. The values for pruning ratio are taken from the author's papers: MinMax, Reliable, GB, CHARM, CRS and MetaRules. [40] Reliable has the best Pruning Ratio, see Fig 4, so we compare it with our approach at different support values, see Table 3.
Reliable Pruning Ratio is the best of KBR 6rules , KBR 9rules and KBR 12rules . Nevertheless, KBR 15rules reaches better Pruning Ratio than Reliable for all supports except 0.4, see Fig. 6. A previous knowledge of 15 rules is equivalent to 0.018% of the whole rule set, for a support value of 0.3, and 7.9% for a support value of 0.7.
With very few rules in KBR is possible to exceed the Pruning Ratio of previous works. Of course there is a narrow relationship between the Pruning Ratio and the repercussion of the previous knowledge rules over the whole set of rules. The Pruning Ratio of knowledge rules increases in the same way that they are able to describe the domain under study. The better KBR results are, the better the user will know the domain under study. Our approach has the possibility to determine when a model can not be improved like in the case of KBR 15rules for a support value of 0.7 where the Pruning Ratio is 100%.

Knowledge vs knowledge based reduction
In section 2 we surveyed some works that used knowledge to reduce the number of association rules presented to the final user. The main goal of those papers is to obtain a set of association rules that satisfies some constraint provided by users, using different forms of knowledge representation. They are able to reduce the association rules set cardinality but generate a lot of rules that represent the same knowledge. Strictly speaking we can not compare our proposal with those ones because of the difference between goals, but we want to test the association rules model cardinality reduction capability of our approach with template, the best known form of knowledge approach. We compare the pruning ratio of our approach with the template implementation proposed in [41] that up-perform the implementation proposed in [16] across five dataset from [2].   The continuous attributes in the data sets used were discretized using a 4-bin equal-frequency discretization. Support and Confidence were set to the same values used in [16]. In table 4 we present the result of our pruning approach (KBR) and compare it with the previous work (MetaRules) [41].
Each row in table 4 represents an experiment where column Dataset contains the dataset id, column TotalRules shows the total number of rules produced by extraction algorithms, MetaRules presents the remaining rules after the application of the aplgorithm proposed in [41] while column KBR contains the average of remaining rules of ten runs of knowledge based redundancy elimination algorithm using a random knowledge of ten rules for each execution. The remaining rules in our approach are lower than the number of rules in metarules approach for all datasets.

Conclusion
The fundamental idea in this work is linked to the main definition of data mining: analysis of large amount of data to extract interesting patterns, previously unknown and the consideration that an association rule that correspond to prior knowledge is a redundant one [37]. Our approach prunes those rules, presenting a simpler model to the final user.
The main contribution in this work is the definition of redundancy of association rules with respect to prior knowledge, and the definition of a mechanism to eliminate this kind of redundancy from the final model of association rules presented to the end user. The redundancy elimination is performed in two procedures, the first one to detect and prune redundant element in rules antecedent and consequent, and the second one to detect if all information provided by a rule is redundant with respect to prior knowledge and then to prune it. The results of this study confirm it is possible to use prior knowledge of experts to reduce the volume of association rules. Models of association rules with fewer rules can be interpreted more clearly by specialists so they can generate advantages in decision making process. The experimental results show that prior knowledge of less than 10% can reach a reduction ratio above 90%.