FAT-Net: A Spectral-Attention Transformer Network for Industrial Audio Anomaly Detection Using MFCC and Raw Features

Yanhua Shi

Abstract


This paper proposes FAT-Net, an audio noise anomaly detection method that integrates big data with a Transformer-based architecture. The model combines Mel-Frequency Cepstral Coefficients (MFCCs) and raw audio features to capture both spectral and temporal characteristics. A novel Spectral Attention Mechanism (SAM) is introduced to enhance sensitivity to anomaly-relevant frequency bands. Experiments were conducted on a large industrial dataset comprising approximately 3,000 audio recordings collected under real manufacturing conditions. FAT-Net was evaluated using accuracy, precision, recall, and F1- score as metrics, achieving a best F1-score of 98.05%, outperforming baseline models such as CNN (90.31%), LSTM (89.04%), and MFCC+LSTM (97.04%). These results demonstrate the effectiveness and generalization capability of FAT-Net for deployment in industrial environments.


Full Text:

PDF


DOI: https://doi.org/10.31449/inf.v49i26.8746

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.