Parallelization of K-Means and Spatial Join Algorithms on Heterogeneous Platforms Using Apache Spark and GPU Integration for Enhanced AI Information Management
Abstract
In the era of artificial intelligence informatization, real-time mining of massive spatial data has become the core bottleneck of intelligent decision-making, and existing methods have problems of poor computational performance and high complexity. Therefore, this study proposes a novel solution based on heterogeneous computing platforms. The approach employs Apache Spark to design a hybrid system integrating Central Processing Units (CPUs) and Graphics Processing Units (GPUs). It achieves parallelization of the K-means algorithm through Spark's elastic distributed datasets and broadcast variables, optimizing both the initial cluster centre selection and new centre determination steps. Concurrently, upper and lower bound constraints alongside group filtering techniques are introduced to reduce computational complexity. For spatial join algorithms, the study achieves efficient spatial data mining and dynamic load balancing through spatial index partitioning and the Compute Unified Device Architecture (CUDA) dynamic parallelization strategy. Experiments have shown that the parallelized K-means algorithm exhibited significantly improved acceleration on different data dimensions. Especially with an acceleration ratio of 45.32 times on 90-dimensional data, the execution efficiency was 0.31 times higher than Spark MLlib. The parallel spatial join algorithm achieved optimal performance with 1,500 partitions, completing computations in just 37.5 seconds while maintaining a data mining accuracy of 94.2%, surpassing traditional algorithms. Its maximum data mining accuracy reached 94.2%, exceeding DBSCAN and GeoSpark by 3.7% and 4.4%, respectively. The research method effectively solves the real-time problem of spatial data mining in artificial intelligence information management, providing scalable technical support for scenarios such as smart cities and autonomous driving.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i30.10302
This work is licensed under a Creative Commons Attribution 3.0 License.








