CB-DBSCAN: A Core Ball-Enhanced Grid-Based Density Clustering Method for High-Dimensional Financial Data
Abstract
To optimize the allocation of financial resources, improve decision-making efficiency and enhance market competitiveness, this paper proposes a financial big data clustering method based on Core Ball (CB) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). First, the algorithm improves DBSCAN using a grid tree. This divides the data space into non-empty grids and constructs a grid tree structure, achieving a fast search of adjacent grids. At the same time, an angle-based pruning strategy is introduced to eliminate redundant trivial points using geometric angle relationships, reduce the number of distance calculations, and lower the computational complexity of the algorithm. On this basis, the concept of CBs is further introduced to divide the dataset into two categories: CBs and non-CBs. The CBs are merged into the same category through density correlation judgment, and the non-CBs are distinguished between boundary points and trivial points to improve the algorithm's processing ability and clustering quality for high-dimensional data. The findings denote that, in the KDD-CUP-99 dataset, the average accuracy of the noise application spatial clustering algorithm based on CB and density is 99.1%. The recall rate shall not be lower than the minimum of 98.2%. The average precision and F1-Score are 98.2% and 98.7%, respectively, and the silhouette coefficient is not less than 0.982. In the Choice Stock Trading Comprehensive Dataset, the accuracy of applying spatial clustering algorithms based on CBs and density noise is not less than 98%. The recall rate shall not be less than 97%. The accuracy and F1-Score are both not less than 98%, which is higher than other algorithms. The average silhouette coefficient and average probability Rand index are 0.986 and 0.927, respectively. The clustering quality of this algorithm is superior to other algorithms. The above results indicate that the noise-based spatial clustering algorithm based on CBs and density can achieve efficient and accurate clustering when processing high-dimensional financial data, and has good application prospects.DOI:
https://doi.org/10.31449/inf.v50i6.9905Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







