Parallelization of K-Means and Spatial Join Algorithms on Heterogeneous Platforms Using Apache Spark and GPU Integration for Enhanced AI Information Management
Abstract
In the era of artificial intelligence informatization, real-time mining of massive spatial data has become the core bottleneck of intelligent decision-making, and existing methods have problems of poor computational performance and high complexity. Therefore, this study proposes a novel solution based on heterogeneous computing platforms. The approach employs Apache Spark to design a hybrid system integrating Central Processing Units (CPUs) and Graphics Processing Units (GPUs). It achieves parallelization of the K-means algorithm through Spark's elastic distributed datasets and broadcast variables, optimizing both the initial cluster centre selection and new centre determination steps. Concurrently, upper and lower bound constraints alongside group filtering techniques are introduced to reduce computational complexity. For spatial join algorithms, the study achieves efficient spatial data mining and dynamic load balancing through spatial index partitioning and the Compute Unified Device Architecture (CUDA) dynamic parallelization strategy. Experiments have shown that the parallelized K-means algorithm exhibited significantly improved acceleration on different data dimensions. Especially with an acceleration ratio of 45.32 times on 90-dimensional data, the execution efficiency was 0.31 times higher than Spark MLlib. The parallel spatial join algorithm achieved optimal performance with 1,500 partitions, completing computations in just 37.5 seconds while maintaining a data mining accuracy of 94.2%, surpassing traditional algorithms. Its maximum data mining accuracy reached 94.2%, exceeding DBSCAN and GeoSpark by 3.7% and 4.4%, respectively. The research method effectively solves the real-time problem of spatial data mining in artificial intelligence information management, providing scalable technical support for scenarios such as smart cities and autonomous driving.DOI:
https://doi.org/10.31449/inf.v49i30.10302Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







