Mining Multi-Dimensional Intra and Inter-Association Patterns of Call Records for Targeted Advertising using Multi-Granulation Rough Sets

Customer contacts to various businesses identified in telecom call records convey their interest in availing those services. Multi-dimensional attribute dependence with day and time of such communications generate useful insights for targeted advertising. Also, frequent and significant inter patterns of service associations give the probability that takers of one service may also be the prospects of the other. This work presents a multi granulation rough sets model to address the issue of prospect discovery from interest traits depicted in call records. The proposed method solves problems like higher computational complexity and large statistically insignificant patterns space inherent in traditional intra and interpattern mining methods. The algorithm is tested to generate target audience for food and restaurant business using one-month data of anonymous call records of a Thailand based telecom service provider. Some interesting mathematical properties of underlying knowledge structures are also validated.


Introduction
Marketers face the challenge of reaching the right audience for product promotion. Similarly, telecom service providers have heaps of knowledge nuggets in the form of call records but face challenges in transforming data into revenue. The subscribers also want schemes and discounts on products or services. Thus, there is a need for transformation of raw data into knowledge to achieve three-way win-win situation. An enormous proliferation of databases in almost every area of human endeavor has created a high demand for new, powerful tools for turning data into useful, task-oriented knowledge. Data mining deals with the science of converting data into knowledge. [1] Under this discipline, multi-dimensional association patterns generate extra information about associated attribute values in the same transaction. Similarly, inter patterns find attribute dependence over a temporal span. In the context of call records, multidimensional association patterns with day and time of service called give knowledge of best day and hour for targeted marketing. Since in call records, each transaction corresponds to interest in one business, information about associated services over a temporal span can be found using inter patterns of such associations. Various algorithms are proposed in the literature for Intra(association) pattern mining and Inter (sequential) pattern mining [2]- [5]. Feng, et al. [3] studied Multidimensional intra pattern mining with application to meteorological data and then further gave a rough set framework for mining generalized inter patterns and highlighted challenges with the setting of minimum support to measure pattern interestingness. Various variants like closed patterns, maximal patterns address the issues of efficiency and large result space of these methods. [2], [8]- [12] In all the above literature, pattern interestingness is measured using two metrics "Support (probability of the presence of an attribute in a database)" and "Confidence (conditional probability of attribute cooccurrence)" representing commonness and strength of attribute association. Only those patterns are interesting that cross minimum support and confidence pre-defined by the user.
However, recent research [13] [14] highlights various flaws in the current framework for mining patterns by these metrics. Pattern pruning by using minimum support and confidence criteria generates both statistical Type I error (accepting spurious rules) and Type II errors (rejecting right association rules). The author illustrated that pruning using significance of dependence measure (t) with threshold >2 leads to all non-redundant patterns. Also, degree of dependence can be right criteria besides support for measuring attribute dependence. The above metrics for pattern interestingness is especially applicable in the context of voluminous data of Call records to businesses since some attribute values are overrepresented, and some are underrepresented in call records and traditional support confidence based pruning will either filter prospects or generate enormous pattern space.
Most of the research in multi-dimensional intra and inter pattern mining is based on the apriori pruning systems. As per new research on interestingness measures generate both false positive and true negative implication relations. Parallel research in granular computing emphasize on data mining on granular structures which are abstract linguistic, natural language formulations on information spaces [15]. The theory gives a computing paradigm where information granule is a clump of objects drawn by indistinguishability, similarity and proximity of functionality [16]- [18]. Qian, Liang, Yao, & Dang extended Pawlak's Rough set model to multi-granulation rough sets. They presented several pivotal algorithms and provided a mechanism for problem-solving and rule extraction using multi-granulations. Yang et al. [15] illustrated theories related to hierarchical structures on partition based multi-granulation spaces. They presented optimistic and pessimistic rough set approximations of granular knowledge spaces which utilize a family of equivalence relations. The authors gave definitions of hierarchical structures on partition based multigranulation spaces which create finer and coarser knowledge structures.
This work combines the research in pattern interestingness measures and rough sets based granulation and partitioning based knowledge structures to address issues in present multi-dimensional intra and inter pattern mining methods like higher algorithm time complexity and huge error prone pattern space of implication relations.
The idea originated from applications of conventional apriori-like methods for mining targeted audience for product promotion from previous calls to similar businesses. The proposed method solves the problem of call records based prospecting for services using the ideas of Multi-granulation partitioning and algorithms for generating lower approximation in MGRS (Multigranulation Rough Sets). The proposed information retrieval system derives Intra patterns of most common and significant attribute dependence of day and Hour with the services under study. This knowledge is useful for identification of best Day an Hour for service-specific promotions. We further derive inter patterns of service associations. Knowledge on common and significant attribute dependence of services enables a potentially wider audience for service promotions. The method is tested to extract prospective customers for food and restaurant business (café, food ordering services, restaurant and bakery) by granular information of call records of anonymous subscribers of a telecom company of Thailand. Along with best day and Hour of promotion for all the food and restaurant business derived using multi-dimensional association rules; inter pattern mining is used to generate prospects for other related entertainment concepts like travel agencies, beauty salon and nightclub. Besides derivation of statistically valid implication rules that are directly usable for decision making the proposed rough set based intra and inter pattern mining method has lower time complexity and smaller result space than traditional methods. The paper also presents validation of mathematical properties of Multi-granulation knowledge structures so formed.

Preliminaries
This section explains some preliminary definitions and concepts underlying the problem addressed and underlying knowledge structures.

Call data records [19]
Given an information system of call records characterized by (1) X is the set of subscribers under study, B is the set of businesses for which prospects are desired; Ai's are other multi-dimensional attributes under study like day of call, hour of contact, subscriber location etc. Each Ai is a set of attributes A = {a 1 , a 2 , … . . a m } ; V=∪Va, Va is the set of values of attributes a ∈ A ; f: X × A → Va is an information function f(x, ai) ∈ Va ∀x ∈ X and ai ∈ A (2)

Concept hierarchy [20]
Let Va ′ = {a ′ 1 , a ′ 2 … . . a ′ m } be the domain of attribute set A' such that i ∈ A ⊆ a′ meaning A' is a set of linguistic of elements of A. Further, A'' can be further higher order attributes that define Va' as well creating hierarchical knowledge structures about Va. We create Knowledge base K with linguistic representations of attributes in U.

Rough set [16], [18]
Let ≠ ∅ be a universe of discourse. A is a family of equivalence relations on U, then the pair K = (U, A′) is referred as a knowledge base. If ⊆ ' and ≠∅ then ∩ (intersection of all equivalence relations in P) is also an equivalence relation, and will be denoted by IND(P). It is referred as an indiscernibility relation over P in Pawlak's rough set theory. ∀ ∈ ', we use U/R to represent the family of equivalence classes, which are generated from the equivalence relation R. Therefore, ∀ ∈ , [ ] denotes the equivalence class on R which contains x. Given K = (U, A′) be a knowledge base suppose P ⊆ A then IND(P) is also an equivalence relation. Here, the attribute set A' is linguistic knowledge about A in U. For example, in case of transactions of call records, X represents the subscriber identification, the attribute A1 for interest that is the "called number" maps into the knowledge based as A1' that is the business Sub category called and other Ai's represent other taxonomical knowledge like business category, day of the calling, Hour of call etc.

Lower and upper approximation [16]
Let K = (U, A′) be a knowledge base ⊆ ', then ∀X ⊆ U , the lower approximation and upper approximation of X are denoted by P(X) and P ̅ (X) respectively.

Multi-granulation rough sets [15], [21]
Multi-granulation rough sets are constructed by the family of equivalence relations leading to a family of granular spaces. Two possible constructs used in work are optimistic and pessimistic multi-granulation rough sets.

Optimistic multi-granulation rough sets
In optimistic multi-granulation lower approximation, in multi-independent granular space, we need only at least one of the granular spaces to satisfy the inclusion condition and the target. The upper approximation is defined as the complement of the optimistic multi granulation lower approximation. [15] [22] Let ={ , } be a knowledge base, i.e. ={ 1, 2,.. } in which 1, 2,.. ∊ A, then ∀ ⊆ , the optimistic multigranulation lower and upper approximations of X are denoted by

Pessimistic multi-granulation rough sets [15] [22]
In pessimistic multi-granulation rough sets, the lower approximation is approximated by a family of equivalence relation with the condition that all granular spaces satisfy the inclusion condition between the equivalence class and the target. The upper approximation of the pessimistic multi-granulation rough set is complement of the lower approximation of the pessimistic multi-granulation rough set.

Absolute support
Given =( , ′) be a knowledge base ⊆ ' the support of each ∈ ′ is the size of equivalence class formed on ai . Alternatively, support of an item is defined as the count of its occurrence in the database.

Degree of dependence [23]
In the real world data, it is quite common that some value combinations are overrepresented, while others are totally missing. In this situation, we cannot make any judgements concerning dependences between attribute sets, but still we can find significant dependencies. In the association rule literature, the relative difference is often defined via another measure called the degree of dependence (dependence [31], degree of independence [32], or interest [7]): The degree of dependence is computed as Equation nine γ(A 1 = a 1 , A 2 = a 2 ) = P(A 1 = a 1 , A 2 = a 2 )/P(A 1 = a 1 ) * P(A 2 = a 2 ) (9)

Significance of dependence [13]
The idea of statistical significance tests is to estimate the probability of the observed or a rarer phenomenon, under some null hypothesis. When the objective is to test the significance of the dependency between X = x and Y = y, the null hypothesis is the independence assumption: P(X = x,Y = y) = P(X = x)P(Y = y). If the estimated probability p is very small, we can reject the independence assumption, and assume that the observed dependency is not due to chance, but significant at level p. The metric in case of statistical dependence of association is derived as: This metric if >2 gives statistically valid dependence between associated attributes under study.

Multi-dimensional intra (association) patterns and inter (sequential pattern)
Given a knowledge base =( , ′) a multi-dimensional Intra pattern is the size of the equivalence class formed on equivalence on multiple dimensions of interest. Alternately, multi-dimensional intra(association) pattern is characterized by the frequency of attribute value pairs ( , ′, ′, k′) which form the equivalence class based partitions in the knowledge base.

Inter patterns
Inter patterns are attribute pairs ( ,( ′)t,( ′)t+n) and is derived by attribute combinations over a temporal span against each xi . These are derived by frequency of attribute value pairs in the pessimistic lower approximation of multi-granulation rough sets.
Hierarchical structures of partition based multigranulation spaces [15] Given a knowledge base =( , ′) in which ∈ ′, R is an equivalence relation that induces a partition based granular space. Given ℛ ⊆ ′ then integration of all granular spaces forms a multi-granular space. Formally, a partition based multi-granular space (PBMS) denoted by

A multi-granulation rough sets based framework for prospecting based on call records
We propose the following method for prospecting based on call records:

Transform u to k
Given an information system U of call records, use functional mapping of attributes like Called number (called number) its linguistic representation (business subcategory contacted). Similarly, transforming the date/time of the call into two distinct attributes day of contact and hour of contact.

Partition the knowledge base
This knowledge base so formed is further partitioned by equivalence classes by business subcategory with additional attributes like business category and information on higher upward taxonomy. Each induced partition give definite prospects for the business subcategory under study. The size of each partition is the support for the service in that sub-category.

Multi-dimensional intra pattern mining
We use the equivalence class based partitioning in each partition to derive day, hour wise multi-dimensional Intra patterns. We compute the degree of dependence and t value for attribute value combinations. This gives combinations of Day and Time for specific business subcategories. Sort pattern with support and degree of dependence and t value. The multi-dimensional attribute dependence so identified for each business with day and hour can be used to find optimal and possible days and times for business promotion.

Inter pattern mining
The algorithm given by [22] is used for extracting inter patterns of service associations. Similar pruning principles of t value are used to derive attribute dependence. The intersection of various partitions gives those subscribers with multiple service interests. The union of all granular spaces on business subcategory on a coarser attribute Business category gives the optimistic lower approximation encapsulating all possible prospects for business categories (higher order granulation) under study. The pessimistic lower approximations lead to prospects of multiple business sub-categories. Those subscribers who exist in all granular spaces under a business subcategory (finer multi-granulation) are positive prospects for that business category which is the coarser equivalence class. The algorithm pseudo code for the mining algorithm is given in Pseudo code 1. The graphic illustration of inter patterns discovered with pessimistic lower approximation is given in Figure 3.0. The partitioning on the basis of business sub-category day and hour gives Support for multi-Dimensional Intra patterns or associations. The most significant of these are derived using pruning criteria of degree of dependence and significance of dependence. Further, if no patterns cross the t>2 threshold, next best criteria of best support and degree of dependence are considered. This gives best combination of patterns of service promotions. From the given example, it is clear that the maximal pattern length is two and we can infer from any non-empty intersection that subscribers of one service might take offers from promotions of other related services. While the optimistic lower approximation generates possible set of prospects, pessimistic approximations or inter patterns gives definite service takers of multiple services and best support for inter patterns gives best expanded audience for promotion.
(are the partitions in the knowledge base on the basis of business sub-category). The algorithm gives the Pessimistic lower approximation of X by P Algorithm: Multi-Dimensional Intra Patterns 1.

Mathematical properties of granular multi-dimensional intra patterns and granular inter patterns
Theorem 1: The space of Multi-Dimensional Intra patterns is coarser on the equivalence class formed by business category and is antisymmetric and transitive with respect to finer equivalence relations on business subcategory.
Proof: Since the union of all finer equivalence relations (ℛi) are encapsulated in (ℛ2) which is coarser equivalence relation, K(ℛi) ≺ K(ℛ2) for all i from 1 to 4 (business sub-categories) from Theorem "Partition based multi-granulation of formed by strictly finer and coarser equivalence relation" [15] ≺ is transitive and antisymmetric. This implies prospects of café are also prospects for all food related business. Support of all prospects as per coarser equivalence relation is not the same sum of support of all prospects in finer equivalence relations. This is true as K(ℛi1) ∩ Kℛi2) ≠ ∅.

Theorem 2:
The space of inter patterns derived from finer multi granulation [15] of pessimistic lower approximations are larger sets than sets formed by coarser equivalence relations. Also, upper approximations are smaller than lower approximations.
Proof: This is implied for corollary 1 "Partition-based multi-granulation set formed by strictly finer relation with coarser equivalence class has a bigger pessimistic lower approximation" [15] if K(ℛ1) ≺ K(ℛ2) then R1 ( ) ⊇ R2 ( ) and Theorem 1 as the pessimistic lower approximation formed on R2 will be those subscribers that have presence in all finer equivalence relations. This set will be smaller than set of subscribers that have presence in either one of the finer equivalence relations.

Results of intra and inter pattern mining on call records
The proposed method was tested on anonymised call records tracked for one month to 7765 businesses listed in four Sub-categories under "Food category" namely Restaurant, Bakery, Café and Food ordering Services in a local business directory. Neither, demographic nor SMS data was used or captured for our analysis. Also, the details of generated prospects had pseudo-identifiers. This data can only be used by the provider for promotional exercise only on mutual consent for using such information. Also, Call records had only date and time stamp information of the Business sites under Study. Moreover, call logs with the subscriber resolution have been used to mine nuggets of knowledge [24][25] with due considerations of user privacy. Also, only the knowledge spaces as in Table 2 The scope of the study was extended to other business subcategories in the entertainment taxonomy like beauty and health, Spa, travel agency, nightclub to study the Inter pattern space etc. The support for maximal patterns is as given in Figure 5.0. It is evident using the proposed method; we can find those subscribers who can be directly targeted for the services they have availed. Further, using the concept of optimistic lower approximation a bigger audience that is those who have availed a related service can also be contacted for promotion.

Discussion
The proposed method has lower time complexity and smaller statistically valid result space which is directly usable for decision making.

Time complexity comparison
We can derive Intra patterns from call records using tradition multi-Dimensional Intra patterns mining using efficient variants of apriori methods [26] at user defined support and confidence thresholds by embedding day and Hour information in individual transactions. Inter patterns of service associations can be mined with the same by collating sequence of customer contacts in the form of a transaction for a given period under study. Time complexity of hash tree like approach is O(N ( w k )α k k ) [27] where N is the number of transactions w is the maximum transaction width and is the cost of updating support count of a candidate k itemset in a hash tree. Using proposed method of "Multi-Granulation Rough Set based Multi-dimensional Intra and Inter pattern mining" Time complexity is the sum of the cost of database partitioning, computing lower approximations and computing pessimistic lower approximation.
Which is O(m|N/ki| 2 )+ O(m × ki × log ki) , Here m is the number of partition and N/Ki is the size of the dataset in the partition which is lower than traditional methods.

Comparison of rule space and directly usable pattern discovery
The proposed method suffices Type I and Type II statistical errors in pattern discovery. We derive nonredundant pattern space with granulation; also all knowledge about pattern dependencies is encapsulated in higher order concepts. Given d unique items in the database under study, a total number of itemsets are 2 d and total number of possible rules are 3 d -2 d+1 +1. [27] In the presented application, all rules are important since multidimensional patterns of the most frequent day of contact and most frequent time of contact are desired for all businesses under study. With optimistic and pessimistic granular spaces, d unique items are encapsulated in m unique higher order concepts where m≤d. In our case 32 business sub-categories are mapped into 5 unique categories. Also, pruning by statistical significance and significance of dependence identifies statistically valid dependencies in case of inter-pattern mining. Just ordering rules derived by support gives desired patterns for designing business promotion strategies.

Conclusion
It is evident that all the properties of coarser and finer information granulation hold true for the pattern space of multi-dimensional Intra patterns. The proposed method has lower computation cost and also addresses issues related to Type I and Type II errors common to the support confidence framework. Further, proposed architecture is also scalable due to partitioning and granulation. The future work includes extending the presented idea to incremental mining of intra and inter patterns space. Further, the idea will be tested on problems in environment sciences and spatial data mining. The applicability of the proposed method is limited to domains where a logical concept hierarchy exist in data attributes under study.