Multi-Density Datasets Clustering Using K-Nearest Neighbors and Chebyshev’s Inequality

Amira Bouchemal; Mohamed Tahar Kimour

doi:10.31449/inf.v47i8.4719

Multi-Density Datasets Clustering Using K-Nearest Neighbors and Chebyshev’s Inequality

Abstract

Density-based clustering techniques are widely used in data mining on various fields. DBSCAN is one of the most popular density-based clustering algorithms, characterized by its ability to discover clusters with different shapes and sizes, and to separate noise and outliers. However, two fundamental limitations are still encountered that is the required input parameter of Eps distance threshold and its inefficiency to cluster datasets with various densities. For overcoming such drawbacks, a statistical based technique is proposed in this work. Specifically, the proposed technique utilizes an appropriate k-nearest neighbor density, based on which it sorts the dataset in ascending order and, using the statistical Chebyshev’s inequality as a suitable means for handling arbitrary distributions, it automatically determines different Eps values for clusters of various densities. Experiments conducted on synthetic and real datasets have demonstrated its efficiency and accuracy. The results indicate its superiority compared with DBSCAN, DPC, and their recently proposed improvements.

References

Authors

Amira Bouchemal University of Badji Mokhtar - Annaba
Mohamed Tahar Kimour University of Badji Mokhtar - Annaba

DOI:

https://doi.org/10.31449/inf.v47i8.4719

Downloads

Published

10/06/2023

Issue

Vol. 47 No. 8 (2023): Online-only issue

Section

Online-only

License

Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.

All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.

Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.

How to Cite

Multi-Density Datasets Clustering Using K-Nearest Neighbors and Chebyshev’s Inequality. (2023). Informatica, 47(8). https://doi.org/10.31449/inf.v47i8.4719

Download Citation

Multi-Density Datasets Clustering Using K-Nearest Neighbors and Chebyshev’s Inequality

Abstract

References

Authors

DOI:

Downloads

Published

Issue

Section

License

How to Cite

Developed By

Information