Speech Signal Enhancement Using Progressive Learning and Dense Connected LSTM Networks

Nian Liu

doi:10.31449/inf.v49i10.8235

Speech Signal Enhancement Using Progressive Learning and Dense Connected LSTM Networks

Nian Liu

Abstract

Aiming at the deficiencies of traditional speech signal enhancement models in dealing with long-term dependencies and noise filtering, an application speech signal enhancement model based on progressive learning and dense connection strategies is proposed. This method takes the long short-term memory network structure as the core and realizes the gradual enhancement of noisy speech through layer-by-layer learning and processing. The experimental results showed that this model exhibited excellent enhancement performance in different signal-to-noise ratio environments. In a -5dB signal-to-noise ratio environment, the short-term objective clarity of the research method reached 0.930, which was 4.1% higher than that of delayed neural networks. Moreover, under the 10dB condition, the short-term objective clarity score further increased to 0.957. The distortion signal ratio of the source signal has increased from 2.31 at -5dB to 14.81 at 10dB, indicating the model's ability in noise suppression and signal reconstruction. The assessment score of speech quality perception increased from 1.86 at -5dB to 3.13 at 10dB, and the word error rate decreased to 27.31%, which was 2.47% lower than that of the classical long short-term memory network. The research results show that the proposed model has strong robustness and a good speech enhancement effect when dealing with speech signals with a low signal-to-noise ratio, providing a new solution for the field of applied language processing.

Full Text:

PDF

References

Tesch K, Gerkmann T. Insights into deep non-linear filters for improved multi-channel speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 31(1): 563-575.

Usman A M, Abdullah M K. An assessment of building energy consumption characteristics using analytical energy and carbon footprint assessment model. Green and Low-Carbon Economy, 2023, 1(1): 28-40.

Choudhuri S, Adeniye S, Sen A. Distribution alignment using complement entropy objective and adaptive consensus-based label refinement for partial domain adaptation. Artificial Intelligence and Applications. 2023, 1(1): 43-51.

Ochieng P. Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis. Artificial Intelligence Review, 2023, 56(Suppl 3): 3651-3703.

Richter J, Welker S, Lemercier J M, Lay B, Gerkmann T. Speech enhancement and dereverberation with diffusion-based generative models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31(1): 2351-2364.

Zhang Q, Qian X, Ni Z, Nicolson A, Ambikairajah E, Li H. A time-frequency attention module for neural speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 31(1): 462-475.

Bie X, Leglaive S, Alameda-Pineda X, Girin L. Unsupervised speech enhancement using dynamical variational autoencoders. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30(3): 2993-3007.

Lan C, Wang Y, Zhang L, Yu Z, Liu C, Guo X. Speech enhancement algorithm combining cochlear features and deep neural network with skip connections. Journal of Signal Processing Systems, 2023, 95(8): 979-989.

Huang P, Wu Y. Teacher-student training approach using an adaptive gain mask for lstm-based speech enhancement in the airborne noise environment. Chinese Journal of Electronics, 2023, 32(4): 882-895.

Pandey A, Wang D L. Self-attending RNN for speech enhancement to improve cross-corpus generalization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30(1): 1374-1385.

Zhu Q S, Zhang J, Zhang Z Q, et al. A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023, 31(1): 1927-1939.

Chuang S Y, Wang H M, Tsao Y. Improved lite audio-visual speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30(2): 1345-1359.

Yue H, Duo W, Peng X, Yang J. Reference-based speech enhancement via feature alignment and fusion network. Proceedings of the AAAI Conference on Artificial Intelligence. 2022, 36(10): 11648-11656.

Pashaian M, Seyedin S, Ahadi S M. A novel jointly optimized cooperative DAE-DNN approach based on a new multi-target step-wise learning for speech enhancement. IEEE Access, 2023, 11(1): 21669-21685.

Abdelhamid A A, El-Kenawy E S M, Alotaibi B, Amer G, Abdelkader M, Ibrahim A, Eid M. Robust speech emotion recognition using CNN+ LSTM based on stochastic fractal search optimization algorithm. IEEE Access, 2022, 10: 49265-49284.

Parvathala V, Andhavarapu S, Pamisetty G, et al. Neural comb filtering using sliding window attention network for speech enhancement. Circuits, Systems, and Signal Processing, 2023, 42(1): 322-343.

Wang H, Zhang X, Wang D L. Fusing bone-conduction and air-conduction sensors for complex-domain speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30(1): 3134-3143.

Chengdai H, Jie G, Shansong M, Jinde C. Hopf bifurcation in a fractional-order neural network with self-connection delay. Nonlinear Dynamics, 2023, 111(15):14335-14350.

Priyanka S S, Kumar T K. Multi-channel speech enhancement using early and late fusion convolutional neural networks. Signal, Image and Video Processing, 2023, 17(4): 973-979.

Garg A. Speech enhancement using long short term memory with trained speech features and adaptive wiener filter. Multimedia Tools and Applications, 2023, 82(3): 3647-3675.

DOI: https://doi.org/10.31449/inf.v49i10.8235

This work is licensed under a Creative Commons Attribution 3.0 License.

Username
Password
Remember me