Key Information Recognition in Speech Using Spectral Subtraction and Wavelet Thresholding Methods: An Optimized Approach with the S-PaddleSpeech Model
Abstract
A PaddleSpeech model was developed to address the issue of low accuracy in speech key information detection technology. Spectral subtraction is used in the process to enhance the quality of speech signals and improve signal-to-noise ratio by reducing noise interference. The method of signal processing through wavelet transforms balances the selection of appropriate denoising methods in both time and frequency domains. Frame segmentation, windowing, and Fourier transform techniques were used in the data processing stage. The experiment outcomes show that for specific and non-specific speech, the CNN detection algorithm achieves a keyword recognition accuracy of 0.9 when the sample size is less than 20, while the FNN algorithm achieves an accuracy of 0.8 when the sample size reaches 60. Both in terms of sample size requirements and keyword recognition accuracy, CNN outperforms FNN. In addition, in the application testing of the model, the improved PaddleSpeech model shows significantly better recognition performance for 20 keywords in audio than the original PaddleSpeech model, with a recognition accuracy of up to 90% (P<0.05). In the audio character recognition verification of the improved PaddleSpeech model and SpeechRecognition model, the former correctly recognizes 15019 characters with an accuracy of 98.9580%, while the latter correctly recognizes 14593 characters with an accuracy of 96.1520. The former has an accuracy 2.806% higher than the latter (P<0.05). Therefore, the improved PaddleSpeech model proposed by the research has good speech keyword recognition ability and effectively improves recognition accuracy.
Full Text:
PDFDOI: https://doi.org/10.31449/inf.v49i33.7593

This work is licensed under a Creative Commons Attribution 3.0 License.