FAT-Net: A Spectral-Attention Transformer Network for Industrial Audio Anomaly Detection Using MFCC and Raw Features
Abstract
This paper proposes FAT-Net, an audio noise anomaly detection method that integrates big data with a Transformer-based architecture. The model combines Mel-Frequency Cepstral Coefficients (MFCCs) and raw audio features to capture both spectral and temporal characteristics. A novel Spectral Attention Mechanism (SAM) is introduced to enhance sensitivity to anomaly-relevant frequency bands. Experiments were conducted on a large industrial dataset comprising approximately 3,000 audio recordings collected under real manufacturing conditions. FAT-Net was evaluated using accuracy, precision, recall, and F1- score as metrics, achieving a best F1-score of 98.05%, outperforming baseline models such as CNN (90.31%), LSTM (89.04%), and MFCC+LSTM (97.04%). These results demonstrate the effectiveness and generalization capability of FAT-Net for deployment in industrial environments.DOI:
https://doi.org/10.31449/inf.v49i26.8746Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







