Application of Algorithms with Variable Greedy Heuristics for k-Medoids Problems

Progress in location theory methods and clustering algorithms is mainly targeted at improving the performance of the algorithms. The most popular clustering models are based on solving the p-median and similar location problems (k-means, k-medoids). In such problems, the algorithm must find several points called cluster centers, centroids, medoids, depending on the specific problem which minimize some function of distances from known objects to the centers. In the the k-medoids problem, the centers (medoids) of the cluster must coincide with one of the clustered objects. The problem is NP-hard, and the efforts of researchers are focused on the development of compromise heuristic algorithms that provide a fairly quick solution with minimal error. In this paper, we propose new algorithms of the Greedy Heuristic Method which use the idea of the Variable Neighborhood Search (VNS) algorithms for solving the k-medoids problem (which is also called the discrete p-median problem). In addition to the known PAM (Partition Around Medoids) algorithm, neighborhoods of a known solution are formed by applying greedy agglomerative heuristic procedures. According to the results of computational experiments, the new search algorithms (Greedy PAM-VNS) give more accurate and stable results (lower average value of the objective function and its standard deviation, smaller spread) in comparison with known algorithms on various data sets.


Introduction
The rapid development of artificial intelligence systems using, inter alia, methods of automatic data grouping (clustering) and methods of location theory, as well as increasing requirements for economic efficiency in all branches, creates a request for the creation of new algorithms with higher requirements for accuracy of the result.
The attempts to discover a universal and, at the same time, exact method for solving most popular location and clustering problems (k-means, k-medoids, etc.), which guarantees the global optimum of the objective function, in the case of a large amount of input data has been recognized as unpromising. The efforts of researchers focused on the development of compromise heuristic algorithms that give a quick solution [1]. The heuristic algorithms or procedures, also called "heuristics" in the literature, are algorithms that do not have a rigorous justification, but gives an acceptable solution to the practically important problems. The so-called "greedy" algorithms are also heuristics. On each iteration, the greedy algorithm selects the best solution from a certain neighborhood (subset of intermediate solutions). At the same time, some of the practically important clustering problems require such a solution which is very close to the exact solution of the problem, and also stable during repeated runs of the randomized algorithm, reproducible and, therefore, verifiable. The problems should be solved online within a limited time. Such problems include, for example, the problem of forming special batches of semiconductor devices in the specialized testing centers [1], where the need to obtain stable results is due to the requirement of reproducibility and verifiability of calculation results that are part of the production process involving two parties with different interests: the manufacturer and the test Centre.
The ensemble (collective) approach [2] allows reducing the dependence of the final decision on the selected parameters of the original models and algorithms and obtaining a more stable solution [3], or isolating "controversial" objects for which different clustering models give a contradictory result, into a separate class. The k-medoids problem is a convenient model for building clustering algorithm ensembles due to the adaptability of the model to the use of various measures of the distance between objects.
The overall aim of the continuous location problem [4] is to find the location of one or several points (centers, centroids, medoids) in continuous space. There is an intermediate class of problems that are actually discrete (the number of possible locations of the searched L. Kazakovtsev et al. points is finite), operating with concepts characteristic of the continuous problems. In particular, such is the the kmedoids problem [5,6] (also called the discrete p-median problem [7] in the scientific literature). The main parameters of all such problems are the coordinates of the objects and the distances between them [8][9][10]. The aim of the continuous p-median problem [8] is to find k points (centers, centroids, medians, cluster medoids), such that the sum of weighted distances from N known points, called demand points, consumers, objects or data vectors depending on the formulation of a specific problem, to the nearest of the k centers reaches its minimum.
The allocation problems with Euclidean, Manhattan (rectangular), Chebyshev metrics are well studied (all these metrics are particular cases of metrics based on Minkowski lp-norms [11]), and many algorithms have been proposed for solving the Weber problem for these metrics. In particular, the well-known Weisfeld procedure [12] was generalized for metrics based on Minkowski norms.
If the distance is Euclidean , we have the p-median problem.

=1
, we have the k-means problem.
In the k-medoids model and problem, cluster centers Xj=(xj,1,…,xj,k) called medoids, are searched among the known points Ai, and this is a discrete optimization problem.
The most popular algorithm for the k-medoid problem, Partitioning Around Medoids (PAM) algorithm, was created by L. Kaufman and P. J. Rousseeuw) [13]. It is very similar to the k-means algorithm. Both algorithms divide a lot of objects into groups (clusters) and both are based on attempts to minimize the error (total distance) on each iteration. The PAM algorithm works with medoids, objects that are part of the original set and representing the cluster in which they are included, and the k-means algorithm works with centroids, which are artificially created objects representing a cluster. The PAM algorithm divides a set of N objects into k clusters (k is a parameter of the algorithm). This algorithm operates the pre-calculated distance matrix between objects, its aim is to minimize the distance between the medoid of each cluster and other objects included in the same cluster.
For discrete optimization problems, the local search methods are the most natural and visual [14]. Such problems include the location problems, building networks, schedules, etc. [15][16][17][18]. The standard local descent algorithm starts with some initial solution x0 (in our case, the initial set of medoids) chosen randomly or with the use of some additional algorithm. At each step of a local descent, the current solution is transformed into to the neighboring solution with a smaller value of the objective function until a local optimum is reached. At each step of local descent, the function of neighborhood O defines a set of possible directions of local search. Very often this set consists of several elements and there is a certain freedom in choosing the next solution. On the one hand, when choosing a neighborhood, it is desirable to have a set of O(X) as small as possible in order to reduce the complexity of a single step. On the other hand, a wider neighborhood can lead to a better local optimum. A possible way to resolve this contradiction is to develop complex neighborhoods, the size of which can be varied during local search [19].
In this paper, we propose the use of local search algorithms that contain greedy agglomerative heuristic procedures, as well as the well-known PAM algorithm, using an idea of the Variable Neighborhood Search (VNS) [20]. It is shown that new VNS algorithms have advantages over the standard PAM algorithm and are competitive in comparison with the known genetic algorithms of the Greedy Heuristics Method for the considered problem [21].

Idea of new algorithms
Local search methods have been further developed into metaheuristics [22]. We consider one of them, called the Variable Neighborhoods Search [23,24]. The idea is to systematically vary the neighborhood function during a local search. Flexibility and efficiency explain its competitiveness in solving NP-hard problems, in particular, p-median problems [25], clustering and location problems [26,27].
Let us denote by Nk, k=1,..kmax, the finite set of neighborhood functions preselected for local search. The proposed method with variable neighborhoods relies on the fact that a local minimum in one neighborhood is not necessarily a local minimum in another neighborhood, and the global minimum is the local minimum in all neighborhoods [14]. In addition, on average, local minima are closer to the global than a randomly selected point, and they are located close to each other. This allows us to narrow the search area for a global optimum using information about local optimums already detected. This hypothesis forms the basis for various crossover operators for genetic algorithms [28] and other approaches.
The deterministic local descent with variable neighborhoods (VND) implies a fixed order of changing neighborhoods and finding a local minimum relative to each of them. Probabilistic local descent with variable neighborhoods differs from the previous VND method by a random selection of points from the neighborhood Ok(X)Nk. The stage of finding the best point in the neighborhood is omitted. The probabilistic algorithms are most productive in solving problems of large dimension, when the use of a deterministic version requires too much machine time to perform one iteration.
The basic local search scheme with variable neighborhoods is a combination of the two previous options [23].

VNS algorithm
Step 1. Choose the neighborhoods Ok, k = 1, .. kmax, and the starting point x.
Step 2. Repeat until the stopping criterion is satisfied. 2 Apply a local descent from the starting point x'. The resulting local optimum is denoted by x"; We can use the time limitation or maximum number of iterations as the stop criterion. In the case of largescale problems, the complexity of performing one iteration becomes very large and new approaches are needed to develop effective local search methods.
A popular idea in solving continuous clustering problems is the use of genetic algorithms (GA) and other evolutionary approaches to improve the results of local search [29][30][31]. Many of these evolutionary algorithms recombine the initial solution obtained by one of the simple local search algorithms.
The PAM procedure consists of two phases: BUILD and SWAP: -In the BUILD phase, primary clustering is performed, during which k objects are successively selected as medoids.
-The SWAP phase is an iterative process in which the algorithm makes attempts to improve some of the medoids. At each iteration of the algorithm, a pair is selected (medoid and non-medoid) such that replacing the medoid with a non-medoid object gives the best value of the objective function (the sum of the distances from each object to the nearest medoid). The procedure for changing the set of medoids is repeated as long as there is a possibility of improving the value of the objective function.
Initialization (if needed): 1. Select k objects as a set of medoids (if such set is not given).
2. Build a distance matrix if needed. Build Phase: 3. Assign each object to the nearest medoid. Swap Phase: 4. For each cluster, find objects that reduce the average distance, and if there are such objects, select those that reduce it most strongly, as a medoid.
5. If at least one medoid has changed, return to Step 3, otherwise stop the algorithm.
The following algorithm is the basic algorithm of the Greedy Heuristics Method [1] for clustering problems. His idea is that initially some unacceptable solution with an excessive number of centers / medoids is selected, which is then gradually reduced to a solution with a given number of centers. Algorithm 2. Basic greedy agglomerative heuristic procedure.
Required: the initial number of clusters K, the required number of clusters k <K.
1. Randomly choose an initial solution with K cluster centers S = {X1, ..., Xk} if it is not given.
2. Execute Algorithm 1 with the initial solution S, store a new (improved) solution to S.
Run the basic greedy heuristic (Algorithm 2) with S as the initial solution. The result obtained (the resulting set, as well as the value of the objective function) is memorized.
2. Return the best (by the value of the objective function) solution among the solutions obtained in step 1.2.
A simpler, but more demanding in terms of computing resources version of the similar algorithm is presented below. Algorithm 4. The greedy procedure # 2.

Run Algorithm 2 with S as the initial solution.
An intermediate variant is suggested in [21]. There, the Algorithm 2 starts from the set S' united with a randomly chosen subset of S.

Run Algorithm 2 with this set S as the initial solution.
3. Return the best (in terms of the objective function) among the solutions obtained in step 2.2.
These heuristic procedures, which are local search algorithms in the neighborhood of a known ("parent") solution represented by the set S', can be used as a part of various global search strategies. At the same time, as the neighborhoods of this solution S' are formed by adding elements from the other known solution S'' and eliminating the excessive elements from the unified solution using the basic greedy agglomerative heuristic procedure.
The search in such neighborhoods is made by Algorithms 3-5. Thus, these algorithms search in some neighborhoods of the solution S', and the second known solution S'' is a randomly selected parameter of this neighborhood. The general idea of the new Variable Neighborhood Search algorithms neighborhoods for solving the k-medoids problem is given below: Algorithms 6. PAM-VNS.
1. Run Algorithm 1 from a random initial solution, store the resulting set of medoids to S.

Assign O ← Ostart //comment: Ostart is the initial number of neighborhood type).
3. Assign i←0, j←0; (the number of unsuccessful iterations in a particular neighborhood and as a whole by the algorithm).

Computational experiments
In the description of the computation experiments, we used the following abbreviations of the algorithm names: PAM is the classical PAM algorithm in multi-start mode; PAM-VNS1, PAM-VNS2, PAM-VNS3, PAM-VNS1-R, PAM-VNS2-R, and PAM-VNS3-R are variations of Algorithm 6; GA-FULL is the genetic algorithm with a greedy heuristic for the k-medoids problem [1]; GA-ONE is a new genetic algorithm with greedy heuristic [1] where Algorithm 3 is used as a crossing-over procedure.
As test data sets for our experiments, we used the results of non-destructive test tests of prefabricated production batches of semiconductor devices (Tables 1-7), datasets from repositories UCI [32] and Clustering basic benchmark [33]. In our experiments, we used the DEXP computing system (4-core Intel® Core ™ i5-7400 CPU 3.00 GHz, 8 GB of RAM).
For all data sets, 30 attempts were made with each of the 9 algorithms. In every attempt, we fixed the best achieved results. The best values of the objective function (minimum value, mean value, median value and standard deviation) are highlighted in bold italics, the smallest of the best values is additionally highlighted. We used the T-test and the Wilcoxon signed rank test [34,35] (significance level 0.01 for both tests). Note: "↑", "⇑": the advantage of the best of new algorithms over known algorithms is statistically significant ("↑" for the t-test, and "⇑" for the Wilcoxon test); "↓", "⇓": the disadvantage of the best new algorithms compared to known algorithms is statistically significant; "↕", "⇕": advantage or disadvantage is statistically insignificant. Table 4 presents the results of the new algorithm in comparison with known evolutionary algorithms that have worked well in solving this problem [1].
In Table 4, we use the following abbreviations [1]: -GA a genetic algorithm with uniform stochastic crossingover procedure, -GAGH is the genetic algorithm with greedy heuristic #3 as crossingover procedure, -LS is the local search by PAM algorithm in  (Tables 5-7).
For the genetic algorithms, we used the population size NPOP starting from NPOP=20. In [21], authors show that smaller populations (NPOP<10) in the genetic algorithms with the greedy agglomerative crossingover procedure decrease the accuracy of the result, and larger populations (NPOP>50) slow down the algorithm which also decreases the accuracy.
In all genetic algorithms, we used the simple tournament selection. Traditionally [30], such algorithms do not contain any mutation operator.

Conclusion
The results of our computational experiments showed that the new search algorithms in alternating neighborhoods (PAM-VNS) can outperform known algorithms and give more stable results (a lower median and average values and / or standard deviation of the objective function, a smaller spread of the achieved values) and, consequently, better performance in comparison with known algorithms. The comparative efficiency of the new algorithm and its modifications on several data sets has been experimentally proven.      However, for some of the considered datasets, the genetic algorithms show their advantages over new algorithms. Genetic algorithms show the best results in the case of a complex relief of the objective function, the presence of plateaus, due to diversity in the population, as revealed in some of the cases considered. One of the most important shortcomings of such algorithms is the possible "dumping" of the entire population into the region of attraction of a single local minimum. Probably, new VNS-algorithms allow us to avoid this drawback due to the continuous random generation of new solutions, which are parameters of the search neighborhood, and allow us to "jump out" of the attraction region.
Therefore, the interest for further research is the combined approach, combining the presence of a certain population with the possibility of generating random solutions, which play a role similar to that of the mutation operator in traditional genetic algorithms.

Algorithm
Objective