Federated Learning-Based Distributed Autoencoder for Industrial Big Data Anomaly Detection: Integrating LSTM, GRU, and CNN Models
Abstract
This paper proposes an innovative solution based on deep learning for the anomaly detection challenge of big data in the industrial Internet of Things environment. Through in-depth analysis of time series and image data characteristics, LSTM and GRU networks are introduced to process time series data, and CNN models (such as ResNet and InceptionNet) are introduced to deal with image analysis and effectively capture complex patterns in data. Model-building involves not only architectural design, but also optimization strategies such as Adam Optimizer and loss function selection. In the preprocessing stage, the data are cleaned carefully through standardization, normalization, de-trending and de-noising to improve the learning efficiency of the model. The dataset used in the study is from a real - world intelligent manufacturing plant, with 1,000,000 records over three years. It contains 12 - dimensional sensor data (such as temperature, vibration frequency, current intensity), and outliers account for approximately 5% of the total data. Experimental results demonstrate that the distributed autoencoder under the federated learning framework outperforms traditional methods. It achieves an accuracy of 0.97, a recall of 0.94, an F1 - score of 0.95, and an AUC value of 0.96.However, it has a higher training time of 45 minutes and high communication costs during the training process due to the exchange of more data. In the federated learning process, each participating node independently trains the model based on local data, and uses the FedAvg strategy to aggregate parameters on the central server. Homomorphic encryption technology is used to ensure data privacy and prevent the leakage of original data. Compared with baseline methods such as IQR and Isolation Forest, the accuracy of this method is improved by 8%, which is statistically significant after the t-test (p<0.05). Although the training time is 45 minutes, which is higher than the traditional method, it has obvious advantages in complex industrial data processing and privacy protection, and achieves a trade-off between computational efficiency and privacy protection and detection performance. The 95% confidence interval of the precision rate of 0.97 is [0.962, 0.978], the 95% confidence interval of the recall rate of 0.94 is [0.931, 0.949], the 95% confidence interval of the F1 value of 0.95 is [0.943, 0.957], and the 95% confidence interval of the AUC value of 0.96 is [0.952, 0.968].DOI:
https://doi.org/10.31449/inf.v49i29.8511Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







