T-WGAN: A Transformer-Wasserstein GAN Approach for Melody Generation with Structural and Rhythmic Fidelity

Abstract

To address the current challenges of deep learning music generation models in capturing long-range dependencies, ensuring generation diversity, and maintaining training stability, this study proposes an optimized music melody generation model—T-WGAN. The model deeply integrates Transformer and Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP). This study first preprocesses the large-scale Lakh MIDI dataset, extracts single-track main melodies, and converts them into symbolic sequences using a REMI-based event representation. On this basis, the model innovatively adopts a generator based on Transformer decoder to learn the long-range structure of melodies. It also uses a critic based on Transformer encoder for stable adversarial training under the WGAN-GP framework to enhance the diversity and authenticity of generated melodies. Experimental results show that T-WGAN performs excellently on multiple key evaluation metrics. T-WGAN achieves a Rhythmic Consistency Rate (RCR) of 85.17%, significantly higher than baseline models (e.g., Transformer’s 75.68%). Its score on Fréchet Distance for Music (FDM) drops to 31.02, proving that the generated melodies are closer to real music in feature distribution. The conclusion indicates that the proposed T-WGAN model successfully addresses the three core issues in melody generation—structural integrity, diversity, and training stability—synergistically. The findings provide an effective technical approach for generating high-quality music melodies with both structural logic and innovation.

References

Li F. Chord-based music generation using long short-term memory neural networks in the context of artificial intelligence. The Journal of Supercomputing, 2024, 80(5): 6068-6092.

Kwiecień J, Skrzyński P, Chmiel W, et al. Technical, musical, and legal aspects of an ai-aided algorithmic music production system. Applied Sciences, 2024, 14(9): 3541.

Pricop T C, Iftene A. Music generation with machine learning and deep neural networks. Procedia Computer Science, 2024, 246: 1855-1864.

Dash A, Agres K. Ai-based affective music generation systems: A review of methods and challenges. ACM Computing Surveys, 2024, 56(11): 1-34.

Wei L, Yu Y, Qin Y, et al. From tools to creators: A review on the development and application of artificial intelligence music generation. Information, 2025, 16(8): 656.

Li P, Liang T, Cao Y, et al. A novel Xi’an drum music generation method based on Bi-LSTM deep reinforcement learning. Applied Intelligence, 2024, 54(1): 80-94.

Zheng L, Li C. Real-Time Emotion-Based piano music generation using generative adversarial network (GAN). IEEE Access, 2024, 12: 87489-87500.

Huang J, Huang X, Yang L, et al. Dance-conditioned artistic music generation by creative-GAN. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2024, 107(5): 836-844.

Kang J, Poria S, Herremans D. Video 2 music: Suitable music generation from videos using an affective multimodal transformer model. Expert Systems with Applications, 2024, 249: 123640.

Ji S, Yang X, Luo J. A survey on deep learning for symbolic music generation: Representations, algorithms, evaluations, and challenges. ACM Computing Surveys, 2023, 56(1): 1-39.

Liu W. Literature survey of multi-track music generation model based on generative confrontation network in intelligent composition. The Journal of Supercomputing, 2023, 79(6): 6560-6582.

Tanawala B A, Dalwadi D C. Harmonic synergy: Leveraging deep convolutional networks, LSTMs, and RNNs for multi-genre piano roll generation with GANs. SN Computer Science, 2025, 6(4): 297.

Zhang Y, Zhou Y, Lv X, et al. TARREAN: a novel transformer with a gate recurrent unit for stylized music generation. Sensors (Basel, Switzerland), 2025, 25(2): 386.

Li S, Sung Y. MRBERT: pre-training of melody and rhythm for automatic music generation. Mathematics, 2023, 11(4): 798.

Wu S L, Donahue C, Watanabe S, et al. Music controlnet: Multiple time-varying controls for music generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024, 32: 2692-2703.

Majidi M, Toroghi R M. A combination of multi-objective genetic algorithm and deep learning for music harmony generation. Multimedia tools and applications, 2023, 82(2): 2419-2435.

Nag B, Middya A I, Roy S. Melody generation based on deep ensemble learning using varying temporal context length. Multimedia Tools and Applications, 2024, 83(27): 69647-69668.

Agarwal S, Sultanova N. Music generation through transformers. International Journal of Data Science and Advanced Analytics, 2024, 6(6): 302-306.

Roy A, Dasgupta D. Drd-gan: A novel distributed conditional wasserstein deep convolutional relativistic discriminator gan with improved convergence. ACM Transactions on Probabilistic Machine Learning, 2024, 1(1): 1-34.

Jiang H, Chen Y, Wu D, et al. EEG-driven automatic generation of emotive music based on transformer. Frontiers in Neurorobotics, 2024, 18: 1437737.

Wang F. Application of artificial intelligence-based music generation technology in popular music production. Journal of Combinatorial Mathematics and Combinatorial Computing, 2025, 127: 655-671.

Tie Y, Guo X, Zhang D, et al. Hybrid learning module-based transformer for multitrack music generation with music theory. IEEE Transactions on Computational Social Systems, 2024,12(2), 862 - 872.

Ding F, Cui Y. MuseFlow: music accompaniment generation based on flow. Applied Intelligence, 2023, 53(20): 23029-23038.

Authors

  • Chunxiao Zhao Zhengzhou Health College, Ministry of Health and Humanities Education

DOI:

https://doi.org/10.31449/inf.v50i8.11624

Downloads

Published

02/21/2026

How to Cite

Zhao, C. (2026). T-WGAN: A Transformer-Wasserstein GAN Approach for Melody Generation with Structural and Rhythmic Fidelity. Informatica, 50(8). https://doi.org/10.31449/inf.v50i8.11624