A Grid-Coupled Clustering Algorithm with Soft Constraints for Mixed-Attribute Data Streams
Abstract
Mixed-attribute data contains both numerical and categorical attributes, posing challenges for traditional clustering algorithms in managing its dynamics and concept drift. This article proposes a hybrid attribute data stream clustering algorithm that combines soft constraints. Firstly, normalize the mixed-attribute data stream and apply local linear embedding for dimensionality reduction; Secondly, design a mixed-attribute sliding window, based on the idea of grid coupling update, to analyze changes in grid centroids to adapt to dynamic data flows; Finally, fuzzy mathematics is introduced to set soft constraints on interval boundaries (width) and grid cell density, restricting high-frequency cluster shifts. In the experimental section, a comparison was made between the time dimension feature extraction method based on unsupervised learning and the dual interactive generative adversarial network method. On the Forest Cover Type, GMD-4C2D800 Linear, and KDD CUP 99 datasets, the proposed method achieved a minimum CMM value of 0.894 and a minimum Purity value of 0.856, with an accuracy of up to 99.94% and a maximum NMI value of 1, all of which were superior to the comparison methods. The results indicate that the proposed algorithm can effectively adapt to changes in data flow distribution, enhance both clustering accuracy and computational efficiency.DOI:
https://doi.org/10.31449/inf.v50i13.10911Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







