Comparison of the Community Structure Partition Optimization of Complex Networks with Different Community Discovery Algorithms

Complex problems can be transformed into complex networks. Through the community partition of complex networks, the relationship between nodes can be found more clearly. This paper briefly introduces three algorithms for community structure partition of complex networks, which were based on the similarity of common neighbor nodes, ant colony algorithm and density peak clustering, and compared the performance of the three algorithms by using six artificial networks whose chaotic factors gradually increased as well as two real networks in MATLAB software. The results suggested that the increase of chaotic factors in the artificial network reduced the normalized mutual information (NMI) of the partition results calculated by the three algorithms. However, the NMI of the algorithm based on density peak clustering in the same artificial network was the highest, the algorithm based on ant colony algorithm followed, and the algorithm based on the similarity of common neighbor nodes performed the worst. For a real example network, the modularity of the algorithm based on density peak clustering was the highest, the algorithm based on ant colony algorithm was the second, and the algorithm based on the similarity of common neighbor nodes was last. In conclusion, the fuzzier the community structure is in the complex network, the lower the performance of the partition algorithm is, and the algorithm based on density peak clustering has the best performance.


Introduction
Real life is composed of many complex problems. Studying the laws that govern them can help solve them, which can in turn promote social development [1]. However, the complexity of real-life problems makes the surface laws that can be found intuitively have little value, and those surface laws have no fundamental impact on the solution of problems. In order to mine the hidden information, the complex problem is transformed into a complex network. Nodes in a complex network represent the individuals participating in it, and line segments between nodes represent the relationships between them [2]. For example, the rapid development of the Internet can be seen as a complex network. The participating users using the Internet have different identities. They will search the information on Internet according to their own interests and hobbies. Users with different identities will gradually focus on the same or similar interests, thus forming a community structure [3]. Similarly, the structure of a protein can also be seen as a complex network, where genes are nodes. Through community division, genes with similar functions can be summarized, so as to mine the effective information of genes. Through the division of community structure in the complex network, the information contained in the complex network can be more clearly understood, so as to analyze the topological properties and organizational structure of the complex system. Liu et al. [4] put forward a weighted maximum fitness algorithm which optimized the initial node according to the potential energy idea, simplified the node fitness function, and expanded the community according to the potential energy queue. The simulation results showed that the algorithm had higher accuracy and shorter calculation time than the maximum fitness algorithm. Zuo et al. [5] detected the similar energy behavior nodes in the sensor node network using complex network community division algorithm and selected the cluster center and hop nodes using the immune response principle, so as to realize the energy-saving topology of the sensor network. The experimental results showed that the method could reduce the energy consumption. Huang et al. [6] put forward a software network optimal partition method based on the dependency between software functions and verified through experiments that the method could effectively detect the optimal communities in various software. This paper briefly introduces three algorithms for the community structure division of complex networks, which were based on the similarity of common neighbor nodes, ant colony algorithm and density peak clustering algorithm, and compared the performance of the three algorithms by using six artificial 2 Multiple community partition algorithms

Community partition based on the similarity of common neighbor nodes
There are many community partition algorithms that can be used in complex networks. Firstly, the community partition algorithm based on the similarity of common neighbor nodes is introduced. The schematic diagram is shown in Figure 1. Black square point A is any point in community A, black square point B is any point in community B, black circle points are the adjacent node of node A and node B, and hollow circle points are the common adjacent node of node A and node B, then the relationship of nodes in Figure 1 [7] can be expressed as: represents a set of nodes belonging to community B but not belonging to community A. According to the above relationship model, the similarity of two nodes can be calculated: The flow of community partition algorithm based on the similarity of common neighbor nodes is shown in Figure 2. ② According to equation (2), the similarity matrix of the common neighbor nodes is calculated. In the initial network, each node is regarded as a community, which gradually aggregates into different communities in the following iteration. ③ The local influence of each node in the network is calculated [8]: where i L stands for the local influence of node i in the network, i  is the set of neighbor nodes of node i, j  is the secondary neighbor node set of node i, and v N is the degree of the secondary neighbor node of node i. The node with the largest local influence is selected from the network and set as current node i. ④ Node j with the highest similarity with current node i is selected from the similarity matrix of the common neighbor nodes, and then node i is removed from node set V used for calculation to represent that the node has been checked. ⑤ The communities to which current node i and highest similarity node j belong to are determined. If they are the same, step ③ is repeated to select the new current node from the remaining undetected nodes. If they are not the same, the communities of the two nodes are merged, and node j is set as current node i. ⑥ Whether the nodes in the network are traversed is determined, and step ③ ~ ⑤ are repeated if they are not traversed; after traversing, modularity Q of the current network community structure is calculated. ⑦ After merging any two communities in the current network, the modular degree of the community structure is calculated after merging, and then the merging scheme with the largest modular degree is selected. Modular degree Q' under the selected scheme is compared with modular degree Q before merging. If Q' is greater than Q, the network community structure is updated according to the merging scheme, and it is repeated until Q' is not greater than Q. Finally the result is output.

Community partition based on the ant colony algorithm
In addition to the common neighbor similarity method described above, the ant colony algorithm can also be used to divide communities in complex networks. In the community partition of complex network, the ant colony algorithm [9] regards the nodes in the network as ants and then imitates the moving form of ants in foraging to transfer the location of network nodes, so as to realize the node aggregation and community partition. The flow of community division based on the ant colony algorithm is shown in Figure 3.
② The ant position is initialized: motif i u which is composed of core node and neighbor ordinary nodes is labeled as , the two motifs are merged as one community; otherwise independent as a community until all the merging is over. Then the nodes in the network are mapped as ants in the ant colony algorithm: ③ The ants (nodes) in the initially divided network move to the adjacent motif according to the probability transfer formula, and the probability transfer formula [10] is: ④ After the ant is moved, the pheromone of the path is updated [11], and the update formula is: where  is the evaporation rate of pheromone, m is the number of ants, and Q is the probability that the p-th ant chooses the path. ⑤ The WCC ratio of ants before and after moving is calculated according to equation (4). If the ratio is not greater than the set threshold, the label of ants will increase (the node will be added to the target module); otherwise, the label of ants will remain unchanged (the node still belongs to the original module). ⑥ Whether the algorithm meets the termination condition is determined: after the maximum number of iterations is reached, steps ③ ~ ⑤ are repeated if the termination condition is not satisfied; the partition result is output if the termination condition is satisfied.

Community partition based on density peak clustering
The basic principle of community partition based on density peak clustering is to select cluster center nodes from the complex network and then assign non-cluster center nodes to different cluster center nodes according to the distance between nodes, so as to achieve the effect of community partition. The criteria for judging whether a node is a cluster center are: ① the local density of the node itself is large enough; ② the distance between the node and other nodes with sufficient density is large enough. The formula [12] for calculating the local density and relative distance of nodes in a complex network is as follows:   ② the distance between any two nodes in the complex network is calculated. In the complex network, there may not only be one path between any two nodes, so it will affect the calculation of node distance. In this study, the equivalent resistance theory is used to integrate the multipath between nodes, and the node distance is calculated m is the number of valid paths between nodes. As mentioned above, the path between nodes in a complex network is mostly more than one. If each path is regarded as a "resistance", the circuit formed by the network will have the problem of interleaving of parallel connection and series connection, which greatly increases the difficulty of calculation. Therefore, based on the small world effect of a complex network, the path with a small amount of information with a distance greater than 2 is eliminated, so as to reduce the amount of calculation and avoid the influence of interleaving of parallel connection and series connection. ③The local density and relative distance of each node in the network is calculated according to equation (7), and then the decision diagram of node is drawn by taking the local density and relative distance as the horizontal and vertical coordinates. Each node in the decision diagram will present the situation of their own aggregation according to the nearness degree of horizontal and vertical coordinates. Generally speaking, non-cluster center nodes will gather, and the scattered nodes can be regarded as cluster centers. ④After drawing the decision-making graph of the nodes, the non-clustered and scattered nodes can be identified as cluster centers. Usually, the cluster centers are determined manually, but the efficiency is too low, and the subjectivity is too strong. If the nodes are clustered clearly, the problem is not serious, but if the degree of clustering is low, subjectivity bias is too strong, which is not conducive to community partition. Therefore, the cluster center in the decision graph is solved using Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm in this study. DBSCAN algorithm is a clustering algorithm based on density. Its calculation steps are as follows. Firstly, an unprocessed point is selected from the decision diagram, and other points within the scanning radius range of the point are searched. If the number of other points exceeds the set minimum number, that point and the other points get together to form a cluster, and the point is labeled as "having been processed", and then the other points in the cluster are processed in the same way to expand the cluster until all the points in the cluster are processed. If the number of points does not reach the set minimum number, that point is labeled as a noise point, and the above steps are repeated until all the points are processed. The noise points, i.e., clustering center nodes used for density peak clustering, are output.
⑤ Taking the cluster center as the standard, the rest nodes are assigned to the nearest cluster, and the community partition results are output finally.

Experimental environment
In this study, the three algorithms were compiled using MATLAB software, and simulation experiments were carried out. The experiments were carried out in the laboratory server with configurations of Windows 7, Core i7 CPU, and 16 GB memory.

Experimental data
In this study, LFR tool was used to generate artificial network data sets. The relevant parameters are shown in Table 1. Six artificial network data sets were constructed. The total number of nodes (N) in each artificial network data set was 2000, the average nodal degree k was 30, the maximum nodal degree was 60, the nodal degree distribution parameter t1 was 2, the community size distribution parameter t2 was 1, the maximum size of community was 150, and the minimum size of community was 25. The difference between the six artificial network data sets lied in the blend factor mu, which represents the probability that a node in a community was connected to an external community node, which increases gradually from 0.2 to 0.7.
In order to further test the effectiveness of the three community partition algorithms, in addition to the artificial network data set, the real network data set was also used for simulation experiments. As shown in Table  2, football refers to the American college football network, and email refers to the email communication network of University of Spain. The above two networks are common real network data sets used for detecting the performance of community partition algorithm. Among them, the nodes of the football network represents a team, totally 115 nodes, and the edges represents having   competed with other teams, totally 616 edges. Moreover, the network belongs to the labeled network, each team belongs to a clear union, that is, the community structure is known. The nodes of the email network represents university students in the network, totally 1133 nodes, and the edges represent having communicated, totally 5451 edges. Furthermore, the network belongs to the unlabeled network. Due to privacy protection, the personal information and social contact of users in the network are unknown, that is, the community structure is unknown.

Algorithm performance evaluation index
Normalized mutual information (NMI) [14] can be used to compare the similarity between the actual community structure with the community structure divided by the algorithm, and its calculation formula is: X .
An artificial complex network is a set of network data constructed artificially by the LFR tool, so its actual community partition is known. The performance of the algorithm can be measured by NMI, but for the real-life network, the actual community partition is often unknown, such as the email network used in the experiment in this study. Therefore the partition effect of the unlabeled network by the algorithm is evaluated using modularity [15]:

Experimental results
As shown in Figure 5, the NMI of the community partition algorithm based on the similarity of common neighbor nodes for LFR1~LFR6 artificial network was 0.82, 0.77, 0.72, 0.49, 0.35, and 0.28 respectively; the NMI of the community partition algorithm based on the ant colony algorithm for LFR1~LFR6 artificial network was 0.9, 0.87, 0.82, 0.61, 0.49, and 0.4 respectively; the NMI of the community partition algorithm based on density peak clustering for LFR1~LFR6 was 0.98, 0.96, 0.93, 0.74, 0.6 and 0.5 respectively. In this study, the artificial network LFR1~LFR6 improved the blend factor mu gradually, the network community structure became fuzzy gradually, and the NMI of the three algorithms for community structure partition gradually reduced; moreover, in the process of mu increase, under the same artificial network, the community partition algorithm based on density peak clustering had the largest NMI, followed by the community partition algorithm based on the ant colony algorithm and the community partition algorithm based on the similarity of common neighbor nodes.
As mentioned above, due to the large number of participating nodes, real networks have complex connections with each other. When complex networks are analyzed using the community partition algorithm, it is to obtain effective information from the unknown complex networks. Therefore, the community structure of real networks is often unknown. NMI is based on the actual structure of the known network, so it is not suitable for the evaluation on the partition effect of real networks by algorithm. In this study, the stability of the community structure partition by the algorithm was measured by modularity, so as to evaluate the partition effect of the algorithm. As shown in Figure 6, for the football network, the community structure modularity of the community partition algorithm based on the similarity of common neighbor nodes was 0.43, that of the community partition algorithm based on the ant colony algorithm was 0.58, and that of the community partition algorithm based on density peak clustering was 0.69. For the email network, the community structure modularity of the community partition algorithm based on the similarity of common  neighbor nodes was 0.4, that of the community partition algorithm based on the ant colony algorithm was 0.56, and that of the community partition algorithm based on density peak clustering was 0.68. The algorithm based on density peak clustering had the highest modularity of community structure, followed by the ant colony algorithm and the algorithm based on the similarity of common neighbor nodes. That is to say, the community structure was the most stable and most similar to the actual structure when the real network was divided by the algorithm based on density peak clustering.

Conclusion
In this study, three algorithms for the community structure partition of complex networks were introduced, which were based on the similarity of common neighbor nodes, the ant colony algorithm and the density peak clustering algorithm. Then, the performance of the three algorithms was compared by using six artificial networks whose chaotic factors gradually increased and two real networks in MATLAB software. The results are as follows. (1) With the increase of the chaotic factors in the artificial network, the NMI of community partition of the three algorithms gradually decreased, but the NMI of the algorithm based on density peak clustering in the same artificial network was the highest, the algorithm based on the ant colony algorithm was the second highest, and the algorithm based on the similarity of the common neighbor nodes was the lowest.
(2) In the real network, the partition performance of the algorithms was evaluated by the modularity, among which the algorithm based on density peak clustering still had the highest modularity, the algorithm based on the ant colony algorithm was the second highest, and the algorithm based on the similarity of common neighbor nodes was the lowest in modularity.