Federated Learning-Based Distributed Autoencoder for Industrial Big Data Anomaly Detection: Integrating LSTM, GRU, and CNN Models

Xiaoli Li; Haifeng Wang

doi:10.31449/inf.v49i29.8511

Contact Editors Europe, Africa:
Matjaz Gams
N. and S. America:
Karthick Gunasekaran
Asia, Australia:
Vinay Singh
Overview papers:
Maria Ganzha
Wiesław Pawlowski
Aleksander Denisiuk Abstacting / Indexing

Informatica is surveyed by:

ACM Digital Library
Citeseer
COBISS
Compendex
Computer & Information Systems Abstracts
Computer Database
Computer Science Index
dLib.si
DBLP Computer Science Bibliography
Directory of Open Access Journals
Google Scholar
InfoTrac OneFile
Inspec
Linguistic and Language Behaviour Abstracts
Mathematical Reviews, MatSciNet, MatSci on SilverPlatter and Current Mathematical Publications
Scopus Publishing

Informatica is published by:

Support

Informatica is supported by:

ACM Slovenia
Slovenian Society for Pattern Recognition
Slovenian Artificial Intelligence Society
Slovenian Society for Cognitive Science
Slovenian Society of Mathematicians, Physicists and Astronomers
Automatic Control Society of Slovenia
Slovenian Academy of Engineering
International Federation for Information Processing

Journal Help

User

Journal Content Search
Browse

Information

Notifications

About The Authors

Xiaoli Li

Haifeng Wang

Support & Indexing

Federated Learning-Based Distributed Autoencoder for Industrial Big Data Anomaly Detection: Integrating LSTM, GRU, and CNN Models

Xiaoli Li, Haifeng Wang

Abstract

This paper proposes an innovative solution based on deep learning for the anomaly detection challenge of big data in the industrial Internet of Things environment. Through in-depth analysis of time series and image data characteristics, LSTM and GRU networks are introduced to process time series data, and CNN models (such as ResNet and InceptionNet) are introduced to deal with image analysis and effectively capture complex patterns in data. Model-building involves not only architectural design, but also optimization strategies such as Adam Optimizer and loss function selection. In the preprocessing stage, the data are cleaned carefully through standardization, normalization, de-trending and de-noising to improve the learning efficiency of the model. The dataset used in the study is from a real - world intelligent manufacturing plant, with 1,000,000 records over three years. It contains 12 - dimensional sensor data (such as temperature, vibration frequency, current intensity), and outliers account for approximately 5% of the total data. Experimental results demonstrate that the distributed autoencoder under the federated learning framework outperforms traditional methods. It achieves an accuracy of 0.97, a recall of 0.94, an F1 - score of 0.95, and an AUC value of 0.96.However, it has a higher training time of 45 minutes and high communication costs during the training process due to the exchange of more data. In the federated learning process, each participating node independently trains the model based on local data, and uses the FedAvg strategy to aggregate parameters on the central server. Homomorphic encryption technology is used to ensure data privacy and prevent the leakage of original data. Compared with baseline methods such as IQR and Isolation Forest, the accuracy of this method is improved by 8%, which is statistically significant after the t-test (p<0.05). Although the training time is 45 minutes, which is higher than the traditional method, it has obvious advantages in complex industrial data processing and privacy protection, and achieves a trade-off between computational efficiency and privacy protection and detection performance. The 95% confidence interval of the precision rate of 0.97 is [0.962, 0.978], the 95% confidence interval of the recall rate of 0.94 is [0.931, 0.949], the 95% confidence interval of the F1 value of 0.95 is [0.943, 0.957], and the 95% confidence interval of the AUC value of 0.96 is [0.952, 0.968].

Full Text:

PDF

DOI: https://doi.org/10.31449/inf.v49i29.8511

This work is licensed under a Creative Commons Attribution 3.0 License.

Informatica is financially supported by the Slovenian research agency from the Call for co-financing of scientific periodical publications.

Webmaster: Mario Konecki

Username
Password
Remember me