GAN-Based Model for Multi-Instrument Collaborative Music Generation Using Deep Learning

Abstract

The intelligent development of music creation promotes the application of artificial intelligence in multi-instrument collaborative composition. In this study, we propose a multi-instrument music generation model based on a conditional Generative Adversarial Network (cGAN) that explicitly learns different instrument performance patterns and their coordination. The model is trained on a dataset of 19,000 multi-instrument music excerpts collected from Muse Score, Magenta, Spottily and a self-built corpus, covering classical, pop, jazz, electronic and orchestral styles. Audio is converted to a unified format and sampling rate, denoised, and represented by a fused feature set that combines short-time Fourier transform (STFT) spectrograms with Mel-frequency cepstral coefficients (MFCCs) to capture both harmonic structure and timbral characteristics. The generator adopts a multi-layer convolutional and transposed-convolutional architecture conditioned on instrument labels to synthesize multi-track audio segments, while a multi-branch discriminator jointly evaluates global musical coherence, instrument-wise timbre consistency and style conformity. Model parameters are optimized using gradient-based training combined with a genetic search over key hyperparameters to enhance training stability and audio realism.Quantitative experiments show that the proposed model achieves a mean pitch prediction error of 0.42 semitones, a chord recognition accuracy of 92.3%, and a rhythm synchronization rate of 95.1% across common instrument combinations such as piano–violin and guitar–bass. Subjective listening tests with 20 experienced musicians report an average score of 4.3/5 for melody fluency, 4.2/5 for timbre matching and 4.1/5 for perceived instrument coordination. The model performs particularly well in generating melodically fluent lines, harmonically consistent chord progressions and rhythmically stable ensemble parts, and can more accurately simulate collaborative performance effects among different instruments. However, there remains room for improvement in handling highly complex chord transformations and in integrating electronic synthesizer timbres with traditional instruments. Moreover, computational cost and training stability still constrain large-scale practical deployment, indicating that improving generation efficiency and robustness is an important direction for enhancing the application value of AI-based multi-instrument music composition models.

Authors

  • Yuanhang Lin School of Music, Neijiang Normal University

DOI:

https://doi.org/10.31449/inf.v50i5.10631

Downloads

Published

02/02/2026

How to Cite

Lin, Y. (2026). GAN-Based Model for Multi-Instrument Collaborative Music Generation Using Deep Learning. Informatica, 50(5). https://doi.org/10.31449/inf.v50i5.10631