Detection of Synthetic Speech Using Spectral-Cepstral Features and BiLSTM Networks Furkat

Abstract

Experimental results demonstrate 93,4% accuracy on the test set; error analysis reveals that misclassifications predominantly occur between the Person and Robot classes, whereas the Emotion class is recognized more reliably. Feature comparison indicates that log-mel provides a robust baseline with minimal computational cost, LFCC better preserves high-frequency details characteristic of synthetic artifacts, and CQCC is effective in capturing harmonic structure and modulations. Potential directions for improving generalizability and accuracy are discussed, including feature fusion (CQCC/LFCC/log-mel) and statistical pooling for temporal aggregation. The proposed configuration offers a well-balanced trade-off between performance and computational complexity, serving as a strong baseline for anti-spoofing systems.

Author Biographies

Furkat Rakhmatov, Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan

associate professor head of Department Programming technologies

Fakhriddin Abdirazakov, Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan

PhD student Department of Computer Systems

Baxodir Achilov, Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan

Senior Lecturer Department of Computer Systems

Ruslan Baydullayev, Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan

PhD student Department of Software of Information Technologies

Sultanmurat Nasirov, Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan

PhD student Department of Software of Information Technologies

Shakhzod Javliev, Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan

Researcher Department of Computer Systems

References

Haitao Yang, Xiai Yan, Huapeng Wang,, «Dual-branch network with fused Mel features for logic-manipulated speech detection,» Applied Acoustics, т. 110047, № ISSN 0003-682X, https://doi.org/10.1016/j.apacoust.2024.110047, p. Volume 222, 2024.

Gul Tahaoglu, Daniele Baracchi, Dasara Shullani, Massimo Iuliani, Alessandro Piva, «Deepfake audio detection with spectral features and ResNeXt-based architecture,» Knowledge-Based Systems, т. 113726, № ISSN 0950-7051, ISSN 0950-7051,, p. Volume 323, 2025.

M. Rakhimov, S. Javliev, and R. Nasimov, «Parallel approaches in deep learning: use parallel computing,» 7th Int. Conf. Future Netw. Distrib. Syst. (ICFNDS '23), , № doi: 10.1145/3644713.3644738, p. pp. 192–201, 2023.

F. Abdirazakov, S. Atoev and B. Ruslan, «Filtering algorithms for speech signals in MATLAB,» 2021 International Conference on Information Science and Communications Technologies (ICISCT), № doi: 10.1109/ICISCT52966.2021.9670232, pp. pp. 1-4, 2021.

Lam Pham, Phat Lam, Dat Tran, Hieu Tang, Tin Nguyen, Alexander Schindler, Florian Skopik, Alexander Polonsky, Hai Canh Vu, «A comprehensive survey with critical analysis for deepfake speech detection,» Computer Science Review, т. 100757, № ISSN 1574-0137, https://doi.org/10.1016/j.cosrev.2025.100757, p. Volume 57, 2025.

Priyabrata Karmakar, Shyh Wei Teng, Guojun Lu, «Thank you for attention: A survey on attention-based artificial neural networks for automatic speech recognition,» Intelligent Systems with Applications, т. 200406, № ISSN 2667-3053, https://doi.org/10.1016/j.iswa.2024.200406, p. Volume 23, 2024.

Subreena Mushtaq, Samrah Mehraj, Shabir A. Parah, «MMRWFAS: mode modulation technique based robust watermarking framework for audio signals,» Applied Acoustics, т. 110835, № ISSN 0003-682X, https://doi.org/10.1016/j.apacoust.2025.110835, p. Volume 239, 2025.

Kavya Duvvuri, Harshitha Kanisettypalli, Teja Nikhil Masabattula, Susmitha Vekkot, Deepa Gupta, Mohammed Zakariah,, «Unravelling stress levels in continuous speech through optimal feature selection and deep learning,» Procedia Computer Science, т. Volume 235, № ISSN 1877-0509, https://doi.org/10.1016/j.procs.2024.04.163, pp. Pages 1722-1731, 2024.

Shahid Aziz, S. Shahnawazuddin, «Effective preservation of higher-frequency contents in the context of short utterance based children’s speaker verification system,» Applied Acoustics, т. 109420, № ISSN 0003-682X, https://doi.org/10.1016/j.apacoust.2023.109420, p. Volume 209, 2023.

Yonghong Fan, Heming Huang, Huiyun Zhang, Ziqi Zhou, «Temporal-frequency joint hierarchical transformer with dynamic windows for speech emotion recognition,» Engineering Applications of Artificial Intelligence, т. Part B, № ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2025.112152, p. Volume 161, 2025.

Huaifeng Zhang, Pengfei Wu, Guigeng Li, Yuan An, Hao Zhang, «A streaming variable neural speech codec,» Engineering Applications of Artificial Intelligence, т. Part B, № ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2025.112418, p. Volume 162, 2025.

De Hu, Qintuya Si, Feilong Bao, Huaiwen Zhang, «Distributed energy-saving speech enhancement in wireless acoustic sensor networks,» Information Fusion, т. 102593, № ISSN 1566-2535, https://doi.org/10.1016/j.inffus.2024.102593, p. Volume 113.

Sania Gul, Muhammad Salman Khan, Muhammad Fazeel, «Single-channel speech enhancement using colored spectrograms,» Computer Speech & Language, т. 101626, № ISSN 0885-2308, https://doi.org/10.1016/j.csl.2024.101626, p. Volume 86, 2024.

Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari, «Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra,» Computer Speech & Language, т. Volume 58, № ISSN 0885-2308, https://doi.org/10.1016/j.csl.2019.05.008, pp. Pages 347-363, 2019.

S. V. G. Chandrasekhar Paseddula, «Late fusion framework for Acoustic Scene Classification using LPCC, SCMC, and log-Mel band energies with Deep Neural Networks,» Applied Acoustics, т. 107568, № ISSN 0003-682X, https://doi.org/10.1016/j.apacoust.2020.107568, p. Volume 172, 2021.

Gul Tahaoglu, Daniele Baracchi, Dasara Shullani, Massimo Iuliani, Alessandro Piva, «Deepfake audio detection with spectral features and ResNeXt-based architecture,» Knowledge-Based Systems, т. 113726, № ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2025.113726, p. Volume 323, 2025.

Soyul Han, Taein Kang, Jungguk Lee, Narin Kim, Hyejin Won, Yeong-Hwa Kim, Wuming Gong, Il-Youp Kwak,, «A deep neural network approach to heart murmur detection using spectrogram and peak interval features,» Engineering Applications of Artificial Intelligence, т. Part A, № ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2024.109156, p. Volume 137, 2024.

Emel Soylu, Sema Gül, Kübra Aslan Koca, Muammer Türkoğlu, Murat Terzi, Abdulkadir Şengür,, «Speech signal-based accurate neurological disorders detection using convolutional neural network and recurrent neural network based deep network,» Engineering Applications of Artificial Intelligence, т. 110558, № ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2025.110558, p. Volume 149, 2025.

Manoj Kumar Singh, Prakrut Moon, «Wavelet-RNN: A randomized neural network with wavelet-transform-based feature extension,» Neurocomputing, т. 131515, № ISSN 0925-2312, https://doi.org/10.1016/j.neucom.2025.131515.

Jianzhong Yang, Xianquan Zhang, Chunqiang Yu, Guoxiang Li, Zhenjun Tang, «Combined label matrix with the conditional generative adversarial network for secret image restoration,» Alexandria Engineering Journal, т. Volume 129, № ISSN 1110-0168, https://doi.org/10.1016/j.aej.2025.07.026, pp. Pages 811-825, 2025.

Sivanand Achanta, Suryakanth V Gangashetty, «Deep Elman recurrent neural networks for statistical parametric speech synthesis,» Speech Communication, т. Volume 93, № ISSN 0167-6393, https://doi.org/10.1016/j.specom.2017.08.003, pp. Pages 31-42, 2017.

Usman Mahmood Malik, Muhammad Awais Javed, Abdulaziz AlMohimeed, Mohammed Alkhathami, Deafallah Alsadie, Abeer Almujalli,, «A many-to-many matching with externalities solution for parallel task offloading in IoT networks,» Journal of King Saud University - Computer and Information Sciences, т. Issue 7, № ISSN 1319-1578, https://doi.org/10.1016/j.jksuci.2024.102134, p. Volume 36, 2024.

Martin Quaas, Till Requate, «Are two networks one too many? Optimal network sizes under uncertain technological progress,» Economic Modelling, т. 107176, № ISSN 0264-9993, https://doi.org/10.1016/j.econmod.2025.107176, p. Volume 151, 2025.

Jian Zhang, Xianhua Zeng, «M2OCNN: Many-to-One Collaboration Neural Networks for simultaneously multi-modal medical image synthesis and fusion,» Computer Methods and Programs in Biomedicine, т. 108612, № ISSN 0169-2607, https://doi.org/10.1016/j.cmpb.2025.108612, p. Volume 261, 2025.

Gurmail Singh, «One and one make eleven: An interpretable neural network for image recognition,» Knowledge-Based Systems, т. 110926, № ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2023.110926, p. Volume 279, 2023.

Farhin Ahmed, Aaron R. Nidiffer, Aisling E. O'Sullivan, Nathaniel J. Zuk, Edmund C. Lalor,, «The integration of continuous audio and visual speech in a cocktail-party environment depends on attention,» NeuroImage, т. 120143, № ISSN 1053-8119, https://doi.org/10.1016/j.neuroimage.2023.120143, p. Volume 274, 2023.

Sajid Shah, Saima Jabeen, Mohammed ElAffendi, Ishrat Khan, Muhammad Almas Anjum, Mohamed A. Bahloul, «A deep learning based multiple RNA methylation sites prediction across species,» Results in Engineering, т. 104940, № ISSN 2590-1230, https://doi.org/10.1016/j.rineng.2025.104940, p. Volume 26, 2025.

Authors

  • Furkat Rakhmatov Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan
  • Fakhriddin Abdirazakov Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan
  • Baxodir Achilov Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan
  • Ruslan Baydullayev Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan
  • Sultanmurat Nasirov Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan
  • Shakhzod Javliev Tashkent University of information technologies named after Muhammad al-Khworazmi, Tashkent 100084, Uzbekistan

DOI:

https://doi.org/10.31449/inf.v49i36.12281

Downloads

Published

12/20/2025

How to Cite

Rakhmatov, F., Abdirazakov, F., Achilov, B., Baydullayev, R., Nasirov, S., & Javliev, S. (2025). Detection of Synthetic Speech Using Spectral-Cepstral Features and BiLSTM Networks Furkat. Informatica, 49(36). https://doi.org/10.31449/inf.v49i36.12281