GAN-Based Model for Multi-Instrument Collaborative Music Generation Using Deep Learning
Abstract
The intelligent development of music creation promotes the application of artificial intelligence in multi-instrument collaborative composition. In this study, we propose a multi-instrument music generation model based on a conditional Generative Adversarial Network (cGAN) that explicitly learns different instrument performance patterns and their coordination. The model is trained on a dataset of 19,000 multi-instrument music excerpts collected from Muse Score, Magenta, Spottily and a self-built corpus, covering classical, pop, jazz, electronic and orchestral styles. Audio is converted to a unified format and sampling rate, denoised, and represented by a fused feature set that combines short-time Fourier transform (STFT) spectrograms with Mel-frequency cepstral coefficients (MFCCs) to capture both harmonic structure and timbral characteristics. The generator adopts a multi-layer convolutional and transposed-convolutional architecture conditioned on instrument labels to synthesize multi-track audio segments, while a multi-branch discriminator jointly evaluates global musical coherence, instrument-wise timbre consistency and style conformity. Model parameters are optimized using gradient-based training combined with a genetic search over key hyperparameters to enhance training stability and audio realism.Quantitative experiments show that the proposed model achieves a mean pitch prediction error of 0.42 semitones, a chord recognition accuracy of 92.3%, and a rhythm synchronization rate of 95.1% across common instrument combinations such as piano–violin and guitar–bass. Subjective listening tests with 20 experienced musicians report an average score of 4.3/5 for melody fluency, 4.2/5 for timbre matching and 4.1/5 for perceived instrument coordination. The model performs particularly well in generating melodically fluent lines, harmonically consistent chord progressions and rhythmically stable ensemble parts, and can more accurately simulate collaborative performance effects among different instruments. However, there remains room for improvement in handling highly complex chord transformations and in integrating electronic synthesizer timbres with traditional instruments. Moreover, computational cost and training stability still constrain large-scale practical deployment, indicating that improving generation efficiency and robustness is an important direction for enhancing the application value of AI-based multi-instrument music composition models.DOI:
https://doi.org/10.31449/inf.v50i5.10631Downloads
Published
How to Cite
Issue
Section
License
Authors retain copyright in their work. By submitting to and publishing with Informatica, authors grant the publisher (Slovene Society Informatika) the non-exclusive right to publish, reproduce, and distribute the article and to identify itself as the original publisher.
All articles are published under the Creative Commons Attribution license CC BY 3.0. Under this license, others may share and adapt the work for any purpose, provided appropriate credit is given and changes (if any) are indicated.
Authors may deposit and share the submitted version, accepted manuscript, and published version, provided the original publication in Informatica is properly cited.







