Application of Algorithms with Variable Greedy Heuristics for k-Medoids Problems
Abstract
Progress in location theory methods and clustering algorithms is mainly targeted at improving the performance of the algorithms. The most popular clustering models are based on solving the p-median and similar location problems (k-means, k-medoids). In such problems, the algorithm must find several points called cluster centers, centroids, medoids, depending on the specific problem which minimize some function of distances from known objects to the centers. In the the k-medoids problem, the centers (medoids) of the cluster must coincide with one of the clustered objects. The problem is NP-hard, and the efforts of researchers are focused on the development of compromise heuristic algorithms that provide a fairly quick solution with minimal error. In this paper, we propose new algorithms of the Greedy Heuristic Method which use the idea of the Variable Neighborhood Search (VNS) algorithms for solving the k-medoids problem (which is also called the discrete p-median problem). In addition to the known PAM (Partition Around Medoids) algorithm, neighborhoods of a known solution are formed by applying greedy agglomerative heuristic procedures. According to the results of computational experiments, the new search algorithms (Greedy PAM-VNS) give more accurate and stable results (lower average value of the objective function and its standard deviation, smaller spread) in comparison with known algorithms on various data sets.Povzetek: Avtorji predlagajo nove algoritme za reševanje problema lokacije k-medoidov in grozdenja.References
Kazakovtsev L.A. (2016) The greedy heuristics method for systems of automatic grouping of objects. Diss ... Dr. tech. of science. Krasnoyarsk. Siberian Federal University.
Ghosh J., Acharya A. (2011) Cluster ensembles. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 1(4), pp..305−315.
Rozhnov I., Orlov V., Kazakovtsev L. (2018) Ensembles of clustering algorithms for problem of detection of homogeneous production batches of semiconductor devices. CEUR-WS, Vol.2098, pp.338-348.
Drezner Z., Hamacher H. (2004) Facility location: applications and theory. Berlin:Springer-Verlag.
Struyf A., Hubert M., Rousseeuw P. (1997) Clustering in an Object-Oriented Environment. Journal of Statistical Software, Issue 1 (4), pp.1-30.
Kaufman L., Rousseeuw P.J. (1990) Finding groups in data: an introduction to cluster analysis. New York:Wiley.
Moreno-Perez J.A., Roda Garcia J.L., Moreno-Vega J.M. (1994) A Parallel Genetic Algorithm for the Discrete p-Median Problem. Studies in Location Analysis, Issue 7, pp.131-141.
Wesolowsky, G. (1993) The Weber problem: History and perspectives. Location Science, No.1, pp.5-23.
Drezner Z., Wesolowsky G.O. (1978) A Trajectory Method for the Optimization of the Multifacility Location Problem with lp Distances. Management Science, Vol.24, pp.1507-1514.
Farahani R., Hekmatfar M. (2009). Facility location: Concepts, models, algorithms and case studies. Berlin Heidelberg:Springer-Verlag.
Deza M.M., Deza E. (2013) Metrics on Normed Structures. Encyclopedia of Distances. Berlin Heidelberg: Springer, pp.89-99, DOI: 10.1007/978-3-642-30958-85.
Weiszfeld, E. (1937) Sur le point sur lequel la somme des distances de n points donnes est minimum. Tohoku Mathematical Journal, Vol.43, No.1, pp.335-0386.
Kaufman L. and Rousseeuw P.J. (1987) Clustering by means of Medoids. Statistical Data Analysis Based on the L1-Norm and Related Methods, Springer US. pp. 405-416.
Kochetov Yu., Mladenovic N., Hansen P. (2003) Local search with alternating neighborhoods. Discrete analysis and operations research, Series 2, Vol.10(1), pp.11-43.
Nicholson T. A. J. (1965) A sequential method for discrete optimization problems and its application to the assignment, traveling salesman and tree scheduling problems. J. Inst. Math. Appl, Vol.13, pp.362-375.
Page E. S. (1965) On Monte Carlo methods in congestion problems. I: Searching for an optimum in discrete situations. Oper. Res. Vol.13(2), pp. 291-299.
Kernighan B. W., Lin S. (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst. Tech. J. Vol.49. pp.291-307.
Gastrigin L.А. (1978) Random search - specificity, stages of history and prejudice. Questions of cybernetics. M.: Nauch. Council on the complex problem "Cybernetics" of the USSR Academy of Sciences, Vol. 33, pp.3-16.
Kochetov Yu.А. (2010) Local search methods for discrete placement problems. Dis. Doctors of Physical and Mathematical Sciences. Novosibirsk.
Hansen P., Mladenovic N., Bruke E.K., Kendall G. (2014) Variable Neighborhood Search. Search Methodology. Springer US. P.211-238, doi: 10.1007/0-387-28356-0_8.
Kazakovtsev, L.A., Antamoshkin, A.N. (2014) Genetic Algorithm with Fast Greedy Heuristic for Clustering and Location Problems. Informatica, Vol.38(3), pp.229-240.
Osman I. H., Laporte G. (1996) Metaheuristics: a bibliography. Ann. Oper. Res, Vol.63. pp.513-628.
Mladenovic N., Hansen P. Variable neighborhood search. Comput. Oper. Res, Vol.24, P.1097-1100.
Hansen P., Mladenovic N. (2001) Variable neighborhood search: principles and applications (invited review). European J. Oper. Res, Vol.130(3), pp.449-467.
Lopez F. G., Batista B. M., Moreno-Perez J., Moreno-Vega M. (2000) The parallel variable neighborhood search for the p-median problem. Res. Rep. Univ. of La Laguna, Spain.
Brimberg J., Mladenovic N. (1996) A variable neighborhood algorithm for solving the continuous location-allocation problem. Stud. Locat. Anal. V. 10. pp. 1-12.
Hansen P., Mladenovic N., Perez-Brito D. (2001) Variable neighborhood decomposition search. J. Heuristics, Vol.7 (4), pp. 335-350.
Goldberg D. E. (1989) Genetic algorithms in search, optimization, and machine learning. Reading, MA: Addison-Wesley.
Houck C.R., Joines J. A., G.Kay. M. (1996) Comparison of Genetic Algorithms, Random Restart, and Two-Opt Switching for Solving Large Location-Allocation Problems. Computers and Operations Research, Vol. 23, pp. 587-596.
Alp O., Erkut E., Drezner Z. (2003) An Efficient Genetic Algorithm for the p-Median Problem. Annals of Operations Research. Vol.122, pp.21-42, DOI:10.1023/A:1026130003508. (2003).
Neema M.N., Maniruzzaman K.M., Ohgai A. (2011) New Genetic Algorithms Based Approaches to Continuous p-Median Problem. Netw. Spat. Econ., Vol.11, pp.83-99, DOI:10.1007/s11067-008-9084-5.
UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. access date 28.03.2019.
Clustering basic benchmark [http://cs.joensuu.fi/sipu/datasets], access date 28.03.2019.
Sheng W., Liu X. (2006) A genetic k-medoids clustering algorithm. Journal of Heuristics. Vol.12, No.6. P. 447-466.
Wilcoxon F. (1945) Individual comparisons by ranking methods. Biometrics Bulletin, Vol.1(6), pp. 80–83, DOI:10.2307/3001968.
Derrac J., Garcia S., Molina D., Herrera F. (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, Vol. 1(1), pp. 3-18, DOI:j.swevo.2011.02.002.
DOI:
https://doi.org/10.31449/inf.v44i1.2737Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







