Research on Chord Generation in Automated Music Composition Using Deep Learning Algorithms

,


Introduction
Music creation requires a high level of knowledge, relevant experience, inspiration, creativity, and other factors for the creator.Therefore, music composition is usually carried out by professional composers with strong expertise, which poses great difficulties for amateur enthusiasts [1].With the rapid development of technologies such as artificial intelligence, computeraided music composition, or algorithmic composition, has gradually attracted the attention of researchers [2].Computer-aided music composition is a method that uses computer technology and combines knowledge from various fields such as mathematics and music to create music and assist musicians [3].However, this field not only requires the creator to have not only a good foundation in algorithms, but also a certain level of musical knowledge.Therefore, research on automated music composition has become a challenging issue.Music is an essential form of entertainment in people's lives, and artificial intelligence is currently a key area of development [4].Therefore, automated music composition under the umbrella of artificial intelligence has great development prospects [5].It not only lowers the threshold of music composition to a certain extent but also provides more musical resources for music lovers.In a piece of music, melody and chord play a very important role, which affects the listenability of the music.At present, there have been many achievements in melody generation in automatic music composition research, while there is little research on chord generation.Therefore, this paper designed a deep learning-based method for chord generation, using a bidirectional Transformer model to generate chords.By analyzing the coherence and pleasantness of the generated chords, the reliability of the method was proved, which is conducive to obtaining more pleasant chord music and makes some contributions to further research on automatic music composition.

Related works
The following

Relevant music knowledge and automatic music composition
The basic attributes of music include: (1) pitch: the high or low sound of a note, related to the frequency of vibration; (2) loudness: the volume of a note, related to the amplitude of vibration; (3) duration: the length of time a note is played, including quarter notes, eighth notes, and sixteenth notes [10]; (4) timbre: the tone color of music, different voices and instruments produce different timbres.
In the music system, C, D, E, F, G, A, and B are pitch names, and do, re, mi, fa, sol, la, and si are roll calls, as shown in Table 1.
Melody is a combination of pitch and rhythm, the soul and foundation of music, and can be divided into vocal and instrumental melodies.Melodies are composed of musical phrases and developed through repetition and variation.Melodies are composed of sections, with each section composed of several phrases, each phrase composed of several measures, and each measure composed of several motifs.Motifs are the smallest unit of a melody.
Chords are a combination of two or more (usually three) notes and can be divided into triads, seventh chords, and so on.depending on the number of notes.Triads are the most common type of chord, as shown in Figure 1.In automatic music composition, the following methods are currently used.
(1) Markov chain: Notes are selected through an ndimensional transition table to generate melodies, but training on a large dataset is required.It cannot learn abstract concepts in music.
(2) Genetic algorithm: Information is stored by taking melodies or motifs as chromosomes.Regeneration and selection are performed to create new melodies, but the efficiency in composition is low due to the strong influence of subjectivity.
(3) Music rules: A knowledge base containing various music rules is used to create music content, but it is influenced by existing human thinking.
(4) Deep learning: e.g.RNN, LSTM, etc., and training the neural network can learn music rules and generate relevant sequences.

Chord generation method based on Transformer model 4.1 Transformer model
Currently, Hidden Markov Models (HMMs) are commonly used in chord generation, where the melody is used as an observation value to predict the corresponding chord.In music, there is a long-term dependency between chords and melody, but in HMMs, the current state is only influenced by previous states.The Transformer model, as a sequence generation model [11], has better learning performance compared to RNNs, LSTMs, etc., due to the addition of self-attention mechanism, and has shown good performance in natural language processing, machine translation, etc. [12].Therefore, this paper studies chord generation using the Transformer model.First, a brief introduction of the Transformer model is presented.
(1) Self-attention mechanism Self-attention is the most critical component of the Transformer model.It is assumed that there are query vector q, key vector k, and value vector v, which are processed into matrices Q, K, V.The calculation process of the self-attention mechanism is: where d k is the dimension of vector k.
(2) Multi-head attention Composed of multiple self-attentions, it can effectively improve the training speed of the model.The calculation formula is: (3) Position-embedding It is used for capturing the location information of a sequence, and its calculation formulas are: (, 2) = sin (  10000 2/  ), (4) (, 2 + 1) = cos (  10000 2/  ), (5) where p is the position index and d model is the dimension of a vector.
The Transformer model has an encoder-decoder structure.The encoder includes a multi-head attention and a Feedforward Neural Network (FNN), while the decoder includes a multi-head attention, an attention mechanism from the encoder to the decoder, and an FNN.In this structure, if the input is X^t, it is transformed into a hidden state H^t after passing through the encoder.After decoding, the output is Y^t.The calculation formulas of multi-head attention and FNN are as follows: ′(, , ) = ((, , ) + ), (6) ′() = (() + ), (7) where  is a sandwich matrix.

Chord generation method
Chords have similar characteristics to melodies, are influenced by the chords before and after, and have characteristics such as sequential and repetitive.For the chord direction generation, in order to fully learn the information of before and after chords, this paper designs a bidirectional Transformer model to learn the information before and after the current state respectively.Then, since music generally adopts the structure of verse and chorus, the verse and chorus have large differences in melody.Therefore, this paper uses two bidirectional Transformer models to generate the chords of the verse and the chorus, respectively.In addition, there is also an articulation between the verse and the chorus, so a self-attention mechanism is added to the verse chord generation model to learn the articulation between the verse and the chorus.The structure generation for chords is divided into two main elements.
First, for chord coloring, i.e., the pitch composition of a chord, it is assumed that its input includes chord sequence {  } The sequential modeling layer is composed of multihead attention, and its calculation mode is as follows: =   (′ 1 ⨁′ 2 ⨁ ⋯ ⨁′  ) + ℎ  (−1) , ( 8) =    ℎ  (−1) , (10) where l stands for the iteration step, 2 here, σ stands for RELU activation function, W outer , W inner , and W j Q are learnable parameters, and J is the number of heads in multi-head attention, 8 here.The loss function is: , (11) where  represents a binary cross entropy,  represents a classification cross entropy, and   ′ and   ′ are real numbers of    and    .Then, for chord voicing, i.e., spacing between the pitches of a chord and repetition, it is assumed that its inputs are root sequence {b i τ } i=1 T , pitch sequence {p i τ } i=1 T , and duration sequence {d i } i=1 T , and v is voicing.Chord voicing {v i } i=1 T needs to be predicted.The model also uses the same structure as the chord coloring: where W e v and W v are learnable parameters.If the target voicing of the i-th chord is v′ i , then its loss function is: The output of the coloring model is used as input to the vocalization model to generate a sequence with a chord structure based on the coloring post sequence.

Experiment and analysis
One hundred pieces of classical guitar music were collected for the experiment.To verify the effectiveness of the chord generation method designed in this paper, only the melody and guitar chords were retained.At the same time, the cross-repetition of the verse and chorus of the songs was removed, and only one section of the verse and one section of the chorus were preserved for the experiment.The piano roll representation method was used to represent the chords, where the value was 1 if the pitch was included in the chord structure, and 0 otherwise.For example, for the G chord in the key of C major (Figure 2), the chord representation is shown in Table 2.
0 0 1 0 0 0 1 0 0 1 0 0 0 … … During preprocessing, the chords were split into eights bars.The repetition number of the bidirectional Transformer model was set to 0, the dimension was set as 128, the initial learning rate was set as 0.0001, and the batch size was set to 16.There is currently no scientific and objective evaluation method for automatic music composition, so subjective evaluation methods are usually used.Fifty people participated in the evaluation, including ten music professionals who was major in music and have experience in music composition, and 40 ordinary college students.The scoring system was a five-point scale, with 5 being the highest and 1 being the lowest.The scoring criteria are as follows: (1) chord coherence: the degree of coherence of chord transitions; (2) chord pleasantness: the listenability of the chords; (3) chord creativity: the creativity and innovation of the generated chords.
To demonstrate the superiority of the chord generation method proposed in this paper, it was compared with the HMM method [13] and the LSTM method [14], and the results are shown in Table 3. From Table 3, first of all, in terms of the scores given by music professionals and ordinary college students, the scores given by ordinary college students were slightly higher than those given by music professionals.This may be because the professionals have a deeper understanding of musical knowledge, so they tend to give harsher evaluations from a professional perspective when evaluating chords, resulting in lower scores.Secondly, when comparing the chord scores generated by the HMM, LSTM, and Transformer-based methods, the chord scores generated by the Transformer method were significantly higher than those generated by the HMM and LSTM, and music professionals and ordinary college students have made consistent evaluations.Music professionals felt that the chord generated by the HMM had poor creativity, so they gave it an average score of only 1.8.The scores for coherence and pleasantness were also not high.The chords generated by the LSTM performed slightly better than those generated by the HMM, but their coherence, pleasantness, and creativity scores were not higher than 3.0 points.The Transformer-based method scored 3.5 for creativity, and 3.6 and 3.8 for coherence and pleasantness, respectively, which were significantly higher than the HMM and LSTM methods.Although the scores for the the HMM-and LSTM-generated chords given by ordinary college students were slightly higher than the scores given by music professionals, there was a gap compared with the scores of the chord generated by the Transformer-based method.A score of 4.0 was given for the pleasantness of the chord generated by the Transformer-based method, proving that the chord generated by the proposed method had good listenability.
Further analysis of the evaluation results was conducted using the weighted average method.The scores for coherence, pleasantness, and creativity were combined in a ratio of 5:3:2.The scores given by music professionals and ordinary college students were combined in a ratio of 6:4.The final evaluation results are shown in Figure 3. From Figure 3, first of all, after combining the three scores, music professionals gave a score of 2.07 for the chord generated by the HMM, a score of 2.52 for the chord generated by the LSTM, and a score of 3.64 for the chords generated by the Transformer-based method.The score of the Transformer-based method was 1.57 points higher than that of the HMM method and 1.12 points higher than that of the LSTM.Ordinary college students gave a score of 3.17 for the chord generated by the HMM, a score of 3.58 for the chord generated by the LSTM, and a score of 3.91 for the chord generated by the Transformer-based method.The score of the Transformer-based method was 0.74 points higher than that of the HMM and 0.33 points higher than that of the LSTM.This indicated that the chord generated by the proposed method was of higher quality from the perspective of both music professionals and ordinary listeners.Finally, in terms of the total score, the total score for the chord generated by the HMM was 2.51, and the total score for the chord generated by the LSTM was 2.94.The total score for the chords generated by the Transformer-based method was 3.75 points, which was 1.24 points higher than the HMM and 0.81 points higher than the LSTM.These results demonstrated the reliability of the Transformer-based method.

Discussion
The development of computers has driven the innovation of music technology, and automatic music composition has been rapidly developed and applied to various fields of music production.However, the current automatic music composition still faces some limitations.The creation of complex music is still difficult due to the lack of emotion and creativity of human artists, and moreover the algorithmic models all rely on a large amount of training data to achieve high-quality compositions.Deep learning methods are able to adapt to different data mining tasks through self-learning and parameter adjustment, and have significant advantages in processing high-dimensional and complex data, so they also have good applications in the field of automatic music composition.In this paper, based on deep learning, a twolayer bidirectional transformer chord generation model was designed and experimented with guitar music as an example.
According to the results, the chords generated by the bidirectional transformer after learning have achieved good results in the subjective evaluation.Compared with HMM and LSTM, the two-layer bidirectional transformer was more adequate and complete in learning musical features, and therefore the generated chords were closer to the results of human compositions and achieved higher scores after being evaluated by music professionals and general college students.Specifically, the chord innovation score was lower than the chord coherence and chord pleasing scores, which indicates that, similar to the results of the current study, the chords obtained by the algorithm are still deficient in terms of innovative compositions.
The research in this paper mainly focuses on chord generation.This paper separated the melody and chords of music, breaking through the current shortcomings of automatic music composition in chord generation, and obtained more reliable results, making some contributions to the progress of automatic music composition.In future research, the study of human music theory and artistic creation will be strengthened to further improve the algorithm's ability to learn musical tonality and emotion, so that the algorithm can simulate the thinking and creation process of human music and improve the innovation of automatic music composition, thus promoting the rich and diverse development of automatic music composition.

Conclusion
In this paper, a two-layer bidirectional chord generation method was designed using the Transformer model in 94 Informatica 47 (2023) 89-94 M. Zhu deep learning to generate chords for both the verse and chorus sections of a song.Experimental analysis revealed that the chords generated by the Transformer-based method exhibited better performance in terms of coherence, pleasantness, and creativity compared to the HMM and LSTM methods.Both music professionals and ordinary college students gave higher scores for the chord generated by the proposed method, demonstrating the effectiveness of the method.This method can be further promoted and applied to practical automatic music composition.

Figure 1 :
Figure 1: Triads Harmony is the combination of two or more voices, one of which is chords, which refers to the vertical movement of harmony, and the other is harmony progressions, which refers to the vertical movement of harmony created by connecting chords.In automatic music composition, the following methods are currently used.(1)Markov chain: Notes are selected through an ndimensional transition table to generate melodies, but training on a large dataset is required.It cannot learn abstract concepts in music.

Figure 2 :
Figure 2: The G chord in the key of C major.

Figure 3 :
Figure 3: The comprehensive comparison of three chord generation methods.

Table 2 :
Examples of chord representation …