Key Information Recognition in Speech Using Spectral Subtraction and Wavelet Thresholding Methods: An Optimized Approach with the S-PaddleSpeech Model
Abstract
A PaddleSpeech model was developed to address the issue of low accuracy in speech key information detection technology. Spectral subtraction is used in the process to enhance the quality of speech signals and improve signal-to-noise ratio by reducing noise interference. The method of signal processing through wavelet transforms balances the selection of appropriate denoising methods in both time and frequency domains. Frame segmentation, windowing, and Fourier transform techniques were used in the data processing stage. The experiment outcomes show that for specific and non-specific speech, the CNN detection algorithm achieves a keyword recognition accuracy of 0.9 when the sample size is less than 20, while the FNN algorithm achieves an accuracy of 0.8 when the sample size reaches 60. Both in terms of sample size requirements and keyword recognition accuracy, CNN outperforms FNN. In addition, in the application testing of the model, the improved PaddleSpeech model shows significantly better recognition performance for 20 keywords in audio than the original PaddleSpeech model, with a recognition accuracy of up to 90% (P<0.05). In the audio character recognition verification of the improved PaddleSpeech model and SpeechRecognition model, the former correctly recognizes 15019 characters with an accuracy of 98.9580%, while the latter correctly recognizes 14593 characters with an accuracy of 96.1520. The former has an accuracy 2.806% higher than the latter (P<0.05). Therefore, the improved PaddleSpeech model proposed by the research has good speech keyword recognition ability and effectively improves recognition accuracy.DOI:
https://doi.org/10.31449/inf.v49i33.7593Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







