Anomaly Detection in IoT using Enhanced K-means, AGNES Clustering, and Echo State Networks
Abstract
In response to the poor performance of traditional Internet of Things (IoT) anomaly behavior detection models, this study focuses on the advantages and problems of clustering algorithms such as K-means. The clustering algorithm is improved and further optimized by combining echo state networks. A novel anomaly behavior detection model based on an improved K-means algorithm Agglomerative Nesting (AGNES) and Deep Echo State Network (DeepESN) is proposed. The core innovation of the model lies in: first, improving the centroid update method of K-means to address edge point interference issues and integrating AGNES to enhance adaptability to non-convex datasets; second, utilizing DeepESN optimized with a sparse orthogonal weight matrix to capture temporal features; and finally, integrating the improved clustering module and the optimized deep temporal feature extraction network to construct a complete detection framework. To validate the model's performance, experiments are conducted on multiple datasets: synthetic datasets, complex public benchmark datasets (ODDS) after dimensionality reduction, and real-world local IoT environments (a “U”-shaped non-convex dataset with 320 samples). Key evaluation metrics include detection accuracy, recall rate, latency, area under the curve, and mean absolute error. Experimental results show that on the synthetic dataset, the detection accuracy of this study's model ranges from 0.91 to 0.99, significantly outperforming random forest (0.69–0.79), k-nearest neighbors (0.79–0.87), and standard k-means (0.83–0.91).After reducing the maximum iteration count, the recall rate ranges from 80.86% to 93.27%, far exceeding the aforementioned comparison methods (60.05% to 77.78%).On public datasets, KM-A exhibits 181-258ms latency, while KM-A-E reduces latency to 120-194ms via feature compression. The collective range of 120-258ms reflects model adaptability across IoT tiers. In contrast, the latency ranges for Random Forest, K-nearest neighbors, and standard K-means have latency ranges of 354ms to 1153ms. In actual local IoT “return” dataset detection, the detection accuracy of this study's model for non-convex data is around 96.59% (overall 96.56%), far exceeding the model based on standard K-means (74.62%, overall, 73.44%). In local IoT anomaly behavior detection, the average absolute error of this study's model is 5.90, significantly lower than that of the standard K-means-based model (7.38). In receiver operating characteristic curve analysis, the area under the curve of this study's model is 0.83, outperforming the standard K-means-based model (0.66). The study demonstrates that the proposed detection model, based on AGNES and DeepESN, can effectively enhance the efficiency and accuracy of anomaly detection in complex IoT environments, thereby providing a solid foundation for the broader application of IoT technology.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i7.9269
This work is licensed under a Creative Commons Attribution 3.0 License.








