T-WGAN: A Transformer-Wasserstein GAN Approach for Melody Generation with Structural and Rhythmic Fidelity
Abstract
To address the current challenges of deep learning music generation models in capturing long-range dependencies, ensuring generation diversity, and maintaining training stability, this study proposes an optimized music melody generation model—T-WGAN. The model deeply integrates Transformer and Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP). This study first preprocesses the large-scale Lakh MIDI dataset, extracts single-track main melodies, and converts them into symbolic sequences using a REMI-based event representation. On this basis, the model innovatively adopts a generator based on Transformer decoder to learn the long-range structure of melodies. It also uses a critic based on Transformer encoder for stable adversarial training under the WGAN-GP framework to enhance the diversity and authenticity of generated melodies. Experimental results show that T-WGAN performs excellently on multiple key evaluation metrics. T-WGAN achieves a Rhythmic Consistency Rate (RCR) of 85.17%, significantly higher than baseline models (e.g., Transformer’s 75.68%). Its score on Fréchet Distance for Music (FDM) drops to 31.02, proving that the generated melodies are closer to real music in feature distribution. The conclusion indicates that the proposed T-WGAN model successfully addresses the three core issues in melody generation—structural integrity, diversity, and training stability—synergistically. The findings provide an effective technical approach for generating high-quality music melodies with both structural logic and innovation.References
Li F. Chord-based music generation using long short-term memory neural networks in the context of artificial intelligence. The Journal of Supercomputing, 2024, 80(5): 6068-6092.
Kwiecień J, Skrzyński P, Chmiel W, et al. Technical, musical, and legal aspects of an ai-aided algorithmic music production system. Applied Sciences, 2024, 14(9): 3541.
Pricop T C, Iftene A. Music generation with machine learning and deep neural networks. Procedia Computer Science, 2024, 246: 1855-1864.
Dash A, Agres K. Ai-based affective music generation systems: A review of methods and challenges. ACM Computing Surveys, 2024, 56(11): 1-34.
Wei L, Yu Y, Qin Y, et al. From tools to creators: A review on the development and application of artificial intelligence music generation. Information, 2025, 16(8): 656.
Li P, Liang T, Cao Y, et al. A novel Xi’an drum music generation method based on Bi-LSTM deep reinforcement learning. Applied Intelligence, 2024, 54(1): 80-94.
Zheng L, Li C. Real-Time Emotion-Based piano music generation using generative adversarial network (GAN). IEEE Access, 2024, 12: 87489-87500.
Huang J, Huang X, Yang L, et al. Dance-conditioned artistic music generation by creative-GAN. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2024, 107(5): 836-844.
Kang J, Poria S, Herremans D. Video 2 music: Suitable music generation from videos using an affective multimodal transformer model. Expert Systems with Applications, 2024, 249: 123640.
Ji S, Yang X, Luo J. A survey on deep learning for symbolic music generation: Representations, algorithms, evaluations, and challenges. ACM Computing Surveys, 2023, 56(1): 1-39.
Liu W. Literature survey of multi-track music generation model based on generative confrontation network in intelligent composition. The Journal of Supercomputing, 2023, 79(6): 6560-6582.
Tanawala B A, Dalwadi D C. Harmonic synergy: Leveraging deep convolutional networks, LSTMs, and RNNs for multi-genre piano roll generation with GANs. SN Computer Science, 2025, 6(4): 297.
Zhang Y, Zhou Y, Lv X, et al. TARREAN: a novel transformer with a gate recurrent unit for stylized music generation. Sensors (Basel, Switzerland), 2025, 25(2): 386.
Li S, Sung Y. MRBERT: pre-training of melody and rhythm for automatic music generation. Mathematics, 2023, 11(4): 798.
Wu S L, Donahue C, Watanabe S, et al. Music controlnet: Multiple time-varying controls for music generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024, 32: 2692-2703.
Majidi M, Toroghi R M. A combination of multi-objective genetic algorithm and deep learning for music harmony generation. Multimedia tools and applications, 2023, 82(2): 2419-2435.
Nag B, Middya A I, Roy S. Melody generation based on deep ensemble learning using varying temporal context length. Multimedia Tools and Applications, 2024, 83(27): 69647-69668.
Agarwal S, Sultanova N. Music generation through transformers. International Journal of Data Science and Advanced Analytics, 2024, 6(6): 302-306.
Roy A, Dasgupta D. Drd-gan: A novel distributed conditional wasserstein deep convolutional relativistic discriminator gan with improved convergence. ACM Transactions on Probabilistic Machine Learning, 2024, 1(1): 1-34.
Jiang H, Chen Y, Wu D, et al. EEG-driven automatic generation of emotive music based on transformer. Frontiers in Neurorobotics, 2024, 18: 1437737.
Wang F. Application of artificial intelligence-based music generation technology in popular music production. Journal of Combinatorial Mathematics and Combinatorial Computing, 2025, 127: 655-671.
Tie Y, Guo X, Zhang D, et al. Hybrid learning module-based transformer for multitrack music generation with music theory. IEEE Transactions on Computational Social Systems, 2024,12(2), 862 - 872.
Ding F, Cui Y. MuseFlow: music accompaniment generation based on flow. Applied Intelligence, 2023, 53(20): 23029-23038.
DOI:
https://doi.org/10.31449/inf.v50i8.11624Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







