Wavelet Decompositions, Hierarchical Encoding and Convolutional Neural Network Integrated Lossless Audio Codec
Abstract
In this paper, a lossless audio codec is proposed by leveraging Wavelet transformation, Hierarchical encoding with Convolutional Neural Network architecture. In the first phase, three level 1D wavelet decomposition is applied on the input audio for generating approximation and detail coefficients. In the next phase, the approximation and detail coefficients are transformed into binary streams by utilizing the proposed dynamic hierarchical encoding algorithm. In this encoding technique, coefficients are converted to binary by dynamically accumulating the binary path values. In the subsequent phase, the binary stream is transformed into image patterns and further compressed by reducing the dimensionality by the proposed convolutional neural network(CNN) model. The model’s effectiveness is evaluated against current conventional lossless audio benchmarks and machine learning-based methods. Experiment results demonstrate that the method shows better performance than existing lossless audio techniques.References
Nowak, N., Zabierowski, W.(2011): Meth ods of sound data compression–comparison of different standards. Radio electronics and informatics (4), 92–95.
Sharma, K., Gupta, K.(2017), Lossless data compression techniques and their perfor mance. In: 2017 International Conference on Computing, Communication and Automation (ICCCA), pp. 256–261, IEEE.
Mondal, U.K., Debnath, A.(2022), Designing a novel lossless audio compression technique with the help of optimized graph traversal (lacogt). Multimedia Tools and Applications 81(28), 40385–40411.
Mondal, U.K., Debnath, A.(2021), De veloping a dynamic cluster quantization based lossless audio compression (dcqlac). Multimedia Tools and Applications 80(6), 8257–8280.
Mondal, U.K., Debnath, A., et al.(2020), Deep learning-based lossless audio encoder (dllae). In: Intelligent Computing: Image Processing Based Applications, pp. 91– 101. Springer.
Mondal, U.K., Debnath, A., et al.(2023), Designing an iterative adaptive arithmetic coding-based lossless bio-signal compression for online patient monitoring system (iaalbc). In: Frontiers of ICT in Healthcare: Proceedings of EAIT 2022, pp. 655– 664. Springer.
Holighaus, N., Koliander, et al.(2019), Char acterization of analytic wavelet transforms 318 and a new phaseless reconstruction algorithm. IEEE Transactions on Signal 319 processing 67(15), 3894–3908.
Jmour, N., Zayen, S., Abdelkrim, A.(2018, Convolutional neural networks for image 321 classification. In: 2018 International Confer ence on Advanced Systems and 322 Electric Technologies (IC ASET), pp. 397–402,IEEE.
Reznik, Y.A.(2004), Coding of prediction residual in mpeg-4 standard for lossless audio coding (mpeg-4 als). In: 2004 IEEE Interna tional Conference on Acoustics, Speech, and Signal Processing, vol. 3, p. 1024,IEEE.
Yu, R., Lin, X., Rahardja, S., Huang, H. (2005), Mpeg-4 scalable to lossless audio coding-emerging international standard for digital audio compression. In: 2005 IEEE 7th Workshop on Multimedia Signal Processing, pp. 1–4,IEEE.
Wei, B., Wang, J., Gibson, J.D. (2001), Enhanced celp coding with discrete spectral modeling. In: Proceedings of 2001 In ternational Symposium on Intelligent Mul timedia, Video and Speech Processing. ISIMP 2001 (IEEE Cat. No. 01EX489), pp. 111–113,IEEE.
Gunawan, T.S., Zain, M.K.M., Muin, F.A., Kartiwi, M. (2017), Investigation of loss less audio compression using ieee 1857.2 ad vanced audio coding. Indonesian Journal of Electrical Engineering and Computer Science 6(2), 422–430.
Coalson, J.: Xiph. Org Foundation,“FLAC: Free lossless audio codec”. https: //x iph.org/flac/index.html. Accessed:15-10- 2023.
Tu, W., Yang, Y., Du, B., Yang, W., Zhang, X., Zheng, J.(2020), Rnn-based signal 339 classification for hybrid audio data compres sion. Computing 102(3), 813–827.
http://www.wavpack.com/. Accessed: 15- 10-2023.
Oquab, M., Bottou, L., Laptev, I., Sivic, J.(2015), Is object localization for free?- weakly supervised learning with convolu tional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 685–694.
Debnath, A., Mondal, U.K., et al. (2020), Achieving lossless audio encoder through in tegrated approaches of wavelet transform, quantization and huffman encoding (laei wqh). In: 2020 International Conference on Computer Science, Engineering and Applica tions (ICCSEA), pp. 1–5, IEEE.
D¨orfler, M., Bammer, R., Grill, T. (2017), Inside the spectrogram: Convolutional neu ral networks in audio processing. In: 2017 In ternational Conference on Sampling Theory and Applications (SampTA), pp. 152–155, IEEE.
Rim, D.N., Jang, I., Choi, H.(2021) Deep neural networks and end-to-end learn ing for audio compression. arXiv preprint arXiv:2105.11681.
Freitag, M., Amiriparian, S., et al. (2017), audeep: Unsupervised learning of representations from audio with deep recurrent neural networks. The Journal of Machine Learning Research 18(1), 6340–6344.
Mineo, T., Shouno, H.: A lossless audio codec based on hierarchical residual predic tion. (2022), In: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 123–130, IEEE.
LeCun, Y., Boser, B., et al.(1989), Back propagation applied to handwritten zip code 362 recognition. Neural computation 1(4), 541–551.
Wang, K., Qi, X., Liu, H. (2019), Photo voltaic power forecasting based lstm convo lutional network. Energy 189, 116225.
Shannon, C.E. (1948), A mathematical theory of communication. The Bell system 366 technical journal 27(3), 379–423.
Kutter, M., Petitcolas, F.A.: Fair benchmark for image watermarking systems.(1999), In: Security and Watermarking of Multimedia Contents, vol. 3657, pp. 226–239 International Society for Optics and Photonics.
Manju, M., Abarna, P., Akila, U., Yamini, S.(2018),Peak signal to noise ratio & mean square error calculation for various im age patterns using the lossless image com pression in ccsds algorithm. International Journal of Pure and Applied Mathematics 119(12),14471–14477.
Rumelhart, D.E., Hinton, G.E., Williams, R.J. (1986), Learning representations by back propagating errors. nature 323(6088), 533–536.
Krizhevsky, A., Sutskever, I., Hinton, G.E.(2012), Image net classification with deep convolutional neural networks. Advances in neural information processing systems 25.
https://monkeysaudio.com/index.html. Accessed: 15-10-2023.
Mineo, T., Shouno, H.(2022), A lossless au dio codec based on hierarchical residual pre diction. In: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 123–130, IEEE.
DOI:
https://doi.org/10.31449/inf.v48i4.5496Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







