Research on Recognition and Classification of Folk Music Based on Feature Extraction Algorithm

,


Introduction
As an art form, music can express people's thoughts, feelings, and life style and has a role in promoting people's emotion and spirit.With the improvement of human living standards, music has become more and more popular.With the development of science and technology, more and more people have tended to enjoy music through the Internet.Therefore, finding out music which users want to listen to from a massive amount of music has become more and more important, and the recognition and classification of music have attracted more and more extensive attention.Huang et al. [1] improved the hidden Markov model (HMM) using an artificial neural network (ANN).The application of the improved HMM in practical music classification found that HMM had a fast calculation speed but a poor classification performance, ANN had a good classification performance but a high computational complexity.The combination of them could improve the recognition rate of HMM by 4% -5% while maintaining the same calculation speed as HMM.Abidin et al. [2] recognized a Turkish music data set, SymbTr, with ten machine learning algorithms, and found that the performance of the algorithms was between 82% and 88%.Rao et al. [3] studied chord recognition.Pitch Class Profile features were extracted from raw audio and recognized by spare representation.Through the experiment on MIREX09, it was found that the method had robustness to Gaussian white noise.Iloga et al. [4] studied the genre classification of music, designed a sequential pattern mining method, and carried out experiments on GTZAN.They found that the accuracy of the method was 91.6%, which was more than 7% higher than the existing classifiers.Chinese folk music refers to the music played by traditional instruments, which has high artistry and nationality [5], but there is little research on its recognition and classification.Therefore, this study took folk music as the research subject, carried out feature extraction in aspects of time domain and frequency domain, established a feature database, and then identified and classified folk music with a support vector machine (SVM), and verified the reliability of the method through experiments.The present study contributes to the realization of the automatic classification of folk music and the improvement of retrieval efficiency.Musical instruments can be solo or ensemble, and different combinations of musical instruments will form different styles of instrumental music.For example, the music played by percussion instruments has a strong 522 Informatica 44 (2020) 521-525 X. Wang rhythm and rich timbre; the music performed with string instruments has a delicate style and simple and elegant style; the music played with wind instruments, and string instruments tends to be light and lively; the music played with wind instruments and percussion instruments is joyful and enthusiastic.

Music feature extraction
To identify and classify folk music, it is necessary to extract the features of folk music.Music is composed of many monosyllables.In psychology, sound includes the following four characteristics: (1) pitch: pitch refers to people's feeling of the frequency of sound, determined by the number of vibration of an object; (2) sound duration: sound duration refers to the duration of a note, which is determined by the duration of the vibration; (3) sound intensity: sound intensity refers to the loudness that people feel, which is determined by the vibration amplitude; (4) timbre: timbre refers to people's perception of sound quality, which is determined by the material, structure,.and shape of the sound body.
In the recognition of folk music, timbre is the main feature because music is played by different instruments.Timbre is a short-term feature, which can be extracted from the following three aspects.

Time-domain characteristics
Time-domain characteristics aim at the characteristics of the audio signal waveform.The time-domain features selected in this study are as follows.
(1) Short-time average energy (STE): it is used for reflecting the change of music signal amplitude.It refers to the average energy of the signal in the shortterm audio window.For a short-time frame with a window length of , suppose that the signal value of the  -th sampling point is (), the window function is represented by ( − ).For the  -th frame, its STE can be expressed as: (2) Zero crossing rate (ZCR): it refers to the number of times a signal waveform passes through the zero point in a frame.For the  -th frame, its ZCR can be expressed as: where  stands for the sign function,

Frequency-domain characteristics
Audio contains a lot of information, which needs to be obtained in the frequency domain analysis.The frequency domain features can be obtained by converting the signal to the frequency domain through Fourier transform.The features selected in this study are as follows.
(1) Spectrum centroid (SC): it refers to the characteristic quantity of the spectrum center of a signal.Fourier transform is represented by (),  ∈ (, ℎ), and the maximum and minimum values of frequency are represented by  and ℎ respectively.Then SC can be expressed as: (2) Spectrum energy (SE): it refers to the frequency domain energy of the signal, which can be expressed as: (3) Mel frequency cepstrum coefficient (MFCC) [6]: it refers to the cepstrum characteristics at Mel frequency, which has 13 dimensions.Suppose that the frequency of the music signal is f , then its Mel frequency is: ).

Support vector machine-based classification algorithm
SVM is a machine learning method [7], which has significant advantages in a small sample and nonlinear field and has been successfully applied in many fields, such as speech recognition [8] and image classification [9].Suppose that in Euclidean space   , the training sample is {( 1 ,  1 ), ( 2 ,  2 ), . . ., (  ,   )} ( ∈ {+1, −1}), the linear discriminant function is () =  + , and the classification plane equation is  +  = 0, where  refers to the hyperplane normal vector, and  refers to the offset.To separate the samples correctly, the problem can be expressed as: In the case of inseparable linearity, relaxation variable  and penalty factor  are introduced.Then the above equation is transformed into: [(  ) + ] ≥ 1 −   The Lagrange function is introduced to solve the above equation.Lagrange coefficient is set as   , then the optimal classification function is: For any unclassified sample  , the result of classification can be obtained by calculating () .(  ,   ) represents the kernel function.In SVM, the commonly used ones are: (1) linear kernel function: (  ,   ) =   ⋅   ; (2) polynomial kernel function: (  ,   ) = [(  ⋅   ) +
In SVM, the RBF kernel function is the most commonly used and has the best performance; therefore, this study uses RBF kernel function.In SVM, the values of kernel function parameter  and penalty parameter  have a great influence on the results [10], which needs to be determined in the experiment.

Experimental analysis 4.1 Folk music data set
The folk music was downloaded from the Internet and then converted to the WAV format of a single channel with a sampling frequency of 16 KHz by GoldWave software.The music file was processed by slicing by CoolEdit software and divided into 10 s segments.The final data sets obtained are shown in Table 2 Features were extracted from the obtained data set, including 13-dimensional MFCC features and four onedimensional features.The average value and standard deviation were taken, then each segment obtained 36dimensional features.Then 80% of the features were selected as the training set, and 20% as the testing set.

Experimental results
Firstly, two parameters of SVM need to be determined.Two hundred of samples were selected.and determine the value of parameters through the cross test, as shown in Tables 3 and 4.
It was seen from Tables 3 and 4 that the recognition rate of SVM was the highest when  = 4 and  2 = 2 6 .Therefore,  = 4 and  2 = 2 6 were selected as the optimal parameters for the experiment.
The influence of feature selection on the results was compared.The selected features were time-domain, frequency-domain, and time + frequency-domain features of folk music.The results are shown in Figure 1.It was seen from Figure 1 that the recognition rate of SVM was 85.33% when only the time domain features of folk music were selected and was 88.94% when only the frequency domain features were selected, and the increase of 4.23% might be due to the more feature dimensions contained in the frequency domain; when all the features were used for recognition, the recognition rate of SVM was 92.76%, which was 8.7% and 4.3% higher than the time domain and frequency domain.It was found that the recognition effect of SVM was good when all the features were used.
The recognition performance of SVM for different types of folk music is compared, and the results are shown in Figure 2. It was seen from Figure 2 that SVM had the highest recognition rate for erhu, 96.37%, which might be because there was only one kind of string instrument, i.e., erhu, in the folk music data set studied in this study, which was significantly different from other types of folk music.The recognition rate of SVM was 91.38%, 90.77%, and 89.64% for Zheng, Chinese lute, and Guqin, which might be because the three instruments were slightly similar and more difficult to recognize.
To further verify the recognition performance of SVM, BP neural network (BPNN) [11], decision tree [12], and SVM were compared by the same folk music data set.The results are shown in Figure 2. It was seen from Figure 3 that the recognition rates of the three algorithms were 73.48%, 64.29%, and 92.76%, respectively, and the recognition rate of SVM was 26.24% higher than that of BPNN and 44.28% higher than that of the decision tree.The results showed that SVM had significant advantages in the classification and recognition of folk music.

Discussion
The current research on music recognition and classification includes the classification of genres [13], musical instruments [14], emotions [15], composers, and so on.Through the identification and classification, users can quickly and accurately retrieve the music they want to hear, and it is also more convenient to manage the music.With the development of technology, music recognition and classification has made great progress, and more and more machine learning methods have been applied, such as hidden Markov, decision tree, nearest neighbor, etc. [16].In this study, SVM was used for classifying folk music.
In the identification and classification of folk music, this study extracted the time-domain and frequencydomain features to form the folk music data set and then used the SVM method for classification.In the experiment, to obtain the optimal parameters of SVM, this study analyzed the influence of different values on the results by the cross-check method, and then the obtained optimal parameters were used for the next step of the experiment.The results showed that the recognition rate of SVM was higher when more comprehensive features were selected.In folk music recognition, when using timedomain and frequency-domain features, the recognition rate of SVM reached 92.76%.In recognizing different types of folk music, the recognition rate of SVM for erhu was the highest (96.37%), while the recognition rates of three plucked instruments were relatively low.In comparison with other methods, this study selected BPNN and decision tree for comparison.It was seen from Figure 2 that the recognition rate of SVM used in this study was significantly higher than the other two methods, which indicated that SVM had a better performance in the recognition of folk music.
Although some achievements have been made in this paper, further research is needed.In future work, we will: (1) further study the selection of features; (2) further improve the classification performance of SVM; (3) perform experiments on a more extensive data set.

Conclusion
In this study, the method of feature extraction was analyzed for the recognition and classification of folk music, SVM was selected as the classifier, and a data set was established for experimental analysis.The results demonstrated that: (1) the selection of parameters had an influence on the result of folk music recognition; (2) when all the features were used, the recognition rate of SVM was the highest (92.76%);

Figure 1 :
Figure 1: The influence of feature selection on the recognition rate.

Figure 2 :
Figure 2: Recognition effect of different types of folk music.

Figure 3 :
Figure 3: Comparison of recognition effects of different algorithms.

Table 2 :
. Data sets of folk music.

Table 3 :
The influence of the value of  on the recognition rate when the value of  2 takes 2.