Intelligent Archive Search System Using Cuckoo Search-Enhanced K-Prototypes Clustering
Abstract
With the rapid advancement of the times, the amount of data stored in archives is increasing. Aiming at the problem of low data search accuracy of traditional search algorithms in archives, this research proposes an intelligent search method for archives that uses cuckoo search algorithm to improve K-prototypes clustering algorithm. First, the Cuckoo search algorithm is innovatively utilized to improve the K-prototype clustering algorithm and solve the problem of poor searching ability of the K prototype algorithm. Then, the CS-K-prototypes algorithm is introduced for intelligent data search in archives. Finally, experiments are conducted using 200 sets of data collected from machine learning repositories as well as real data from a large archive, and performance evaluation metrics such as Precision-Recall curve, MAE, RMSE, Jacquard Coefficient (JC), Rand Index (RI), and Fowlkes Mallows Scores are used. The experiment findings denote that the K-prototypes clustering algorithm proposed by the research combined with the cuckoo search algorithm has an accuracy and recall curve offline area of 0.9744 when conducting intelligent search of archive data. In contrast with the K-value average clustering algorithm, the accuracy and recall curve offline area increase by 0.073. Compared to the artificial bee colony algorithm, it improves by 0.2252 under conditions of abundant data. In the practical application experiment of the algorithm model, the proposed model achieves a search accuracy of 97.67%. The above results indicate that the improved K-prototypes clustering algorithm proposed by the research can improve the search accuracy of archive data.
Full Text:
PDFReferences
Reference
Poongodi M, Malviya M, Hamdi M, Vijayakumar V, Mohammed M A, Rauf H T, Al-Dhlan K. A. 5G based Blockchain network for authentic and ethical keyword search engine. IET Commun., 2022, 16(5): 442-448.
Reiff S B, Schroeder A J, Kırlı K, Cosolo A, Bakker C, Mercado L, Park P J. The 4D Nucleome Data Portal as a resource for searching and visualizing curated nucleomics data. Nature communications, 2022, 13(1): 2365-2377.
Katz K, Shutov O, Lapoint R, Kimelman M, Brister J R, O Sullivan C. The Sequence Read Archive: a decade more of explosive growth. Nucleic acids research, 2022, 50(D1): D387-D390.
Alotaibi Y. A new meta-heuristics data clustering algorithm based on tabu search and adaptive search memory. Symmetry, 2022, 14(3): 623-635.
Zhu J, Ma X, Martínez L, Zhan J. A probabilistic linguistic three-way decision method with regret theory via fuzzy c-means clustering algorithm. IEEE Transactions on Fuzzy Systems, 2023, 31(8): 2821-2835.
Yang S, Li Q, Li W, Li X, Liu A A. Dual-level representation enhancement on characteristic and context for image-text retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 8037-8050.
Nie F, Li Z, Wang R, Li X. An effective and efficient algorithm for K-means clustering with new formulation. IEEE Transactions on Knowledge and Data Engineering, 2022, 35(4): 3433-3443.
Zhu B, Sun Y, Zhao J, Han J, Zhang P, Fan T. A critical scenario search method for intelligent vehicle testing based on the social cognitive optimization algorithm. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(8): 7974-7986.
Geigle G, Pfeiffer J, Reimers N, Vulić I, Gurevych I. Retrieve fast, rerank smart: Cooperative and joint approaches for improved cross-modal retrieval. Transactions of the Association for Computational Linguistics, 2022, 10(3): 503-521.
Liu S, Nie W, Wang C, Lu J, Qiao Z, Liu L, Anandkumar A. Multi-modal molecule structure–text model for text-based retrieval and editing. Nature Machine Intelligence, 2023, 5(12): 1447-1457.
Minh H L, Sang-To T, Wahab M A, Cuong-Le T. A new metaheuristic optimization based on K-means clustering algorithm and its application to structural damage identification. Knowledge-Based Systems, 2022, 251(6): 109-121.
Zhang H, Li H, Chen N, Chen S, Liu J. Novel fuzzy clustering algorithm with variable multi-pixel fitting spatial information for image segmentation. Pattern Recognition, 2022, 121(6): 201-214.
Zhang J, Yang L, Zhang Y, Tang D, Liu T. Non-parameter clustering algorithm based on saturated neighborhood graph. Applied Soft Computing, 2022, 130(3): 647-659.
Ren F, Han Y, Wang S. A novel high-dimensional trajectories construction network based on multi-clustering algorithm. EURASIP Journal on Wireless Communications and Networking, 2022, 22(1): 18-31.
Dong J, Wang Y, Chen X, Qu X, Li X, He Y, Wang X. Reading-strategy inspired visual representation learning for text-to-video retrieval. IEEE transactions on circuits and systems for video technology, 2022, 32(8): 5680-5694.
Ikotun A M, Ezugwu A E, Abualigah L, Abuhaija B, Hemin, J. K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data. Information Sciences, 2023, 622(4): 178-210.
Oyewole G J, Thopil G A. Data clustering: application and trends. Artificial Intelligence Review, 2023, 56(7): 6439-6475.
Li T, Rezaeipanah A, El Din E S M T. An ensemble agglomerative hierarchical clustering algorithm based on clusters clustering technique and the novel similarity measurement. Journal of King Saud University-Computer and Information Sciences, 2022, 34(6): 3828-3842.
Awad F H, Hamad M M. Improved k-means clustering algorithm for big data based on distributed smartphone neural engine processor. Electronics, 2022, 11(6): 883-895.
Liu C, Wang J, Zhou L, Rezaeipanah A. Solving the multi-objective problem of IoT service placement in fog computing using cuckoo search algorithm. Neural Processing Letters, 2022, 54(3): 1823-1854.
DOI: https://doi.org/10.31449/inf.v49i13.8343
This work is licensed under a Creative Commons Attribution 3.0 License.








