Hybrid Variable-Length Spider Monkey Optimization with Good-Point Set Initialization for Data Clustering
Abstract
Data clustering refers to grouping data points that are similar in some way. This can be done in accordance with their patterns or characteristics. It can be used for various purposes, including image analysis, pattern recognition, and data mining. The K-means algorithm, commonly used for clustering, is subject to limitations, such as requiring the number of clusters to be specified and being sensitive to initial center points. To address these limitations, this study proposes a novel method to determine the optimal number of clusters and initial centroids using a variable-length spider monkey optimization algorithm (VLSMO) with a hybrid proposed measure. Results of experiments on real-life datasets demonstrate that VLSMO performs better than the standard k-means in terms of accuracy and clustering capacity.References
I. Aljarah, H. Faris, and S. Mirjalili, Evolutionary data clustering: Algorithms and applications. Springer, 2021.
S. F. Raheem and M. Alabbas, "Optimal k-means clustering using artificial bee colony algorithm with variable food sources length," International Journal of Electrical & Computer Engineering (2088-8708), vol. 12, no. 5, 2022.
C. Yuan and H. Yang, "Research on K-value selection method of K-means clustering algorithm," J, vol. 2, no. 2, pp. 226-235, 2019.
S. Saatchi and C. C. Hung, "Hybridization of the ant colony optimization with the k-means algorithm for clustering," in Image Analysis: 14th Scandinavian Conference, SCIA 2005, Joensuu, Finland, June 19-22, 2005. Proceedings 14, 2005: Springer, pp. 511-520.
A. Kumar, D. Kumar, and S. Jarial, "A novel hybrid K-means and artificial bee colony algorithm approach for data clustering," Decision Science Letters, vol. 7, no. 1, pp. 65-76, 2018.
M. Neshat, S. F. Yazdi, D. Yazdani, and M. Sargolzaei, "A new cooperative algorithm based on PSO and k-means for data clustering," Journal of Computer Science, vol. 8, no. 2, p. 188, 2012.
B. Li, "An experiment of k-means initialization strategies on handwritten digits dataset," Intelligent Information Management, vol. 10, no. 2, pp. 43-48, 2018.
Y. Li, Z. Ni, F. Jin, J. Li, and F. Li, "Research on clustering method of improved glowworm algorithm based on good-point set," Mathematical Problems in Engineering, vol. 2018, 2018.
Z. Bin, G. Zhichun, and H. Qiangqiang, "A Genetic Clustering Method Based on Variable Length String," in 2019 2nd International Conference on Safety Produce Informatization (IICSPI), 2019: IEEE, pp. 460-464.
G. Komarasamy and A. Wahi, "An optimized K-means clustering technique using bat algorithm," European Journal of Scientific Research, vol. 84, no. 2, pp. 263-273, 2012.
T. Hassanzadeh and M. R. Meybodi, "A new hybrid approach for data clustering using firefly algorithm and K-means," in The 16th CSI international symposium on artificial intelligence and signal processing (AISP 2012), 2012: IEEE, pp. 007-011.
G. Zhu and S. Kwong, "Gbest-guided artificial bee colony algorithm for numerical function optimization," Applied mathematics and computation, vol. 217, no. 7, pp. 3166-3173, 2010.
S. F. Raheem and M. Alabbas, "Fuzzy logic-based self-adaptive artificial bee colony algorithm," in AIP Conference Proceedings, 2023, vol. 2591, no. 1: AIP Publishing.
D. Karaboga and B. Akay, "A modified artificial bee colony (ABC) algorithm for constrained optimization problems," Applied soft computing, vol. 11, no. 3, pp. 3021-3031, 2011.
M. Alabbas and A. Abdulkareem, "Hybrid artificial bee colony algorithm with multi-using of simulated annealing algorithm and its application in attacking of stream cipher systems," Journal of Theoretical and Applied Information Technology, vol. 97, pp. 23-33, 01/15 2019.
J. C. Bansal, H. Sharma, S. S. Jadon, and M. Clerc, "Spider monkey optimization algorithm for numerical optimization," Memetic computing, vol. 6, pp. 31-47, 2014.
K. P. Sinaga and M.-S. Yang, "Unsupervised K-means clustering algorithm," IEEE access, vol. 8, pp. 80716-80727, 2020.
G. S. Ohannesian and E. J. Harfash, "Epileptic Seizures Detection from EEG Recordings Based on a Hybrid system of Gaussian Mixture Model and Random Forest Classifier," Informatica, vol. 46, no. 6, 2022.
S. F. Raheem and M. Alabbas, "Dynamic Artificial Bee Colony Algorithm with Hybrid Initialization Method," Informatica, vol. 45, no. 6, 2021.
C. Blake and C. Merz, "UCI repository of machine learning databases, 1998).(http," archive. ics. uci. edu/ml/index. PHP.
V.-P. Ha, T.-K. Dao, N.-Y. Pham, and M.-H. Le, "A variable-length chromosome genetic algorithm for time-based sensor network schedule optimization," Sensors, vol. 21, no. 12, p. 3990, 2021.
L. Cruz-Piris, I. Marsa-Maestre, and M. A. Lopez-Carmona, "A variable-length chromosome genetic algorithm to solve a road traffic coordination multipath problem," IEEE Access, vol. 7, pp. 111968-111981, 2019.
DOI:
https://doi.org/10.31449/inf.v47i8.4872Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







