Deep Learning-Based Music Generation

Research Article
Open access

Deep Learning-Based Music Generation

Gening Lu 1*
  • 1 Hangzhou Dianzi University    
  • *corresponding author lugn@hdu.edu.cn
Published on 1 August 2023 | https://doi.org/10.54254/2755-2721/8/20230188
ACE Vol.8
ISSN (Print): 2755-273X
ISSN (Online): 2755-2721
ISBN (Print): 978-1-915371-63-8
ISBN (Online): 978-1-915371-64-5

Abstract

Music is one of the greatest inventions in human history. Traditionally, music composition is time-consuming and complex, requiring master sound knowledge based on music theory and musical intuition. In recent decades, deep learning have been applied in music generation, and it has experienced the process from simple sequence generation to multi-track generation considering musicality, multiple methods are implied in study to generate better music and combine existing music theory with deep learning technology, while the current technology already allow people composite easily even without domain knowledge and massive manpower. This paper offers an overview of automatic music generation task, covering majority of the currently popular deep learning-based music generation models. In addition, in latter section discusses how to unify objective criteria for music on subjective way, proposes existing deficiencies and expands possible directions. The research in this review has significant foreshadowing meaning and reference value for developing music generation in the future.

Keywords:

music generation, multi-track, deep learning

Lu,G. (2023). Deep Learning-Based Music Generation. Applied and Computational Engineering,8,366-379.
Export citation

References

[1]. D. Cope, "Experiments in musical intelligence (EMI): Non‐linear linguistic‐based composition," Interface, vol. 18, no. 1–2, pp. 117–139, Jan. 1989, doi: 10.1080/09298218908570541.

[2]. G. Hadjeres, F. Pachet, and F. Nielsen, "DeepBach: a Steerable Model for Bach Chorales Generation," in 2017 ICML 34th International Conference on Machine Learning (ICML). ICML, 2018, pp.1362-1371. doi: 10.48550/arXiv.1612.01010.

[3]. H. H. Mao, T. Shin, and G. Cottrell, "DeepJ: Style-Specific Music Generation," in 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, Jan. 2018, pp. 377–382. doi: 10.1109/ICSC.2018.00077.

[4]. G. Barina, A. Topirceanu, and M. Udrescu, "MuSeNet: Natural patterns in the music artists industry," in 2014 IEEE 9th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, May 2014, pp. 317–322. doi: 10.1109/SACI.2014.6840084.

[5]. C.-Z. A. Huang et al., "Music Transformer: Generating Music with Long-Term Structure." arXiv, Dec. 12, 2018. Accessed: Feb. 07, 2023. [Online]. Available: http://arxiv.org/abs/1809.04281

[6]. S. Ji, J. Luo, and X. Yang, "A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions," J. ACM, Nov. 2020, [Online]. Available: http://arxiv.org/abs/2011.06801.

[7]. A. Nayebi and M. Vitelli, "GRUV: Algorithmic Music Generation using Recurrent Neural Networks." 2015. [Online]. Available: http://cs224d.stanford.edu/reports/NayebiAran.pdf.

[8]. M. Bretan, G. Weinberg, and L. Heck, "A Unit Selection Methodology for Music Generation Using Deep Neural Networks." arXiv, Dec. 12, 2016. Accessed: Feb. 05, 2023. [Online]. Available: http://arxiv.org/abs/1612.03789.

[9]. E. Waite, "Generating long-term structure in songs and stories." Web blog post. Magenta, 15 (4), [Online] Available: https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn, 2016

[10]. A. Roberts, J. Engel, C. Raffel, C. Hawthrone, and D. Eck, "A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music." arXiv, Nov. 11, 2019. Accessed: Jan. 31, 2023. [Online]. Available: http://arxiv.org/abs/1803.05428.

[11]. J. Jiang, G. G. Xia, D. B. Carlton C. N. Anderson, and R. H. Miyakawa, "Transformer VAE: A Hierarchical Model for Structure-Aware and Interpretable Music Representation Learning," in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, May 2020, pp. 516–520. doi: 10.1109/ICASSP40776.2020.9054554.

[12]. P. Salim, Gerardo M, and Sarria M., "Musical Composition with Stochastic Context-Free Grammars," presented at the In Proceedings of 8th Mexican International Conference on Artificial Intelligence, 2016. [Online]. Available: https://hal. inria.fr/hal-01257155. Accessed on 05 April 2021.

[13]. S. Lattner, M. Grachten, and G. Widmer, "Imposing Higher-Level Structure in Polyphonic Music Generation Using Convolutional Restricted Boltzmann Machines and Constraints," Journal of Creative Music Systems, vol. 2, no. 2, Mar. 2018, doi: 10.5920/jcms.2018.01.

[14]. D. Shuqi, J. Zeyu, C. Gomes, and R. B. Dannenberg, "Controllable deep melody generation via hierarchical music structure representation." arXiv, Sep. 01, 2021. Accessed: Feb. 24, 2023. [Online]. Available: http://arxiv.org/abs/2109.00663.

[15]. N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, "Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription." in 2012 ICML 29th International Conference on Machine Learning (ICML). Jun. 2012, doi: 10.1002/chem.201102611.

[16]. Z. Wang, Y. Zhang, Y. Zhang, J. Jiang, R. Yang and J. Zhao et al., "PIANOTREE VAE: Structured Representation Learning for Polyphonic Music." arXiv, Aug. 17, 2020. Accessed: Feb. 06, 2023. [Online]. Available: http://arxiv.org/abs/2008.07118.

[17]. G. Brunner, A. Konrad, Y. Wang, and R. Wattenhofer, "MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer." arXiv, Sep. 20, 2018. Accessed: Feb. 07, 2023. [Online]. Available: http://arxiv.org/abs/1809.07600.

[18]. H. W. Dong, W. Y. Hsiao, L. C. Yang, and Y. H. Yang, "MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment." arXiv, Nov. 24, 2017. Accessed: Jan. 31, 2023. [Online]. Available: http://arxiv.org/abs/1709.06298.

[19]. A. Valenti, A. Carta, and D. Bacciu, "Learning Style-Aware Symbolic Music Representations by Adversarial Autoencoders," in 2020 ECAI 24th European Conference on Artificial Intelligence (ECAI), Feb. 2020. doi: 10.48550/arXiv.2001.05494.

[20]. C. Jin et al., "A transformer generative adversarial network for multi‐track music generation," CAAI Trans on Intel Tech, vol. 7, no. 3, pp. 369–380, Sep. 2022, doi: 10.1049/cit2.12065.

[21]. L. Jiafeng et al., "Symphony Generation with Permutation Invariant Language Model." arXiv, Sep. 16, 2022. doi: 10.48550/arXiv.2205.05448.

[22]. E. R. Miranda, R. Yeung, A. Pearson, K. Meichanetzidis, and B. Coecke, "A Quantum Natural Language Processing Approach to Musical Intelligence." arXiv, Dec. 09, 2021. doi: 10.48550/arXiv.2111.06741.


Cite this article

Lu,G. (2023). Deep Learning-Based Music Generation. Applied and Computational Engineering,8,366-379.

Data availability

The datasets used and/or analyzed during the current study will be available from the authors upon reasonable request.

Disclaimer/Publisher's Note

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of EWA Publishing and/or the editor(s). EWA Publishing and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

About volume

Volume title: Proceedings of the 2023 International Conference on Software Engineering and Machine Learning

ISBN:978-1-915371-63-8(Print) / 978-1-915371-64-5(Online)
Editor:Anil Fernando, Marwan Omar
Conference website: http://www.confseml.org
Conference date: 19 April 2023
Series: Applied and Computational Engineering
Volume number: Vol.8
ISSN:2755-2721(Print) / 2755-273X(Online)

© 2024 by the author(s). Licensee EWA Publishing, Oxford, UK. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license. Authors who publish this series agree to the following terms:
1. Authors retain copyright and grant the series right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this series.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the series's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this series.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See Open access policy for details).

References

[1]. D. Cope, "Experiments in musical intelligence (EMI): Non‐linear linguistic‐based composition," Interface, vol. 18, no. 1–2, pp. 117–139, Jan. 1989, doi: 10.1080/09298218908570541.

[2]. G. Hadjeres, F. Pachet, and F. Nielsen, "DeepBach: a Steerable Model for Bach Chorales Generation," in 2017 ICML 34th International Conference on Machine Learning (ICML). ICML, 2018, pp.1362-1371. doi: 10.48550/arXiv.1612.01010.

[3]. H. H. Mao, T. Shin, and G. Cottrell, "DeepJ: Style-Specific Music Generation," in 2018 IEEE 12th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, Jan. 2018, pp. 377–382. doi: 10.1109/ICSC.2018.00077.

[4]. G. Barina, A. Topirceanu, and M. Udrescu, "MuSeNet: Natural patterns in the music artists industry," in 2014 IEEE 9th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, May 2014, pp. 317–322. doi: 10.1109/SACI.2014.6840084.

[5]. C.-Z. A. Huang et al., "Music Transformer: Generating Music with Long-Term Structure." arXiv, Dec. 12, 2018. Accessed: Feb. 07, 2023. [Online]. Available: http://arxiv.org/abs/1809.04281

[6]. S. Ji, J. Luo, and X. Yang, "A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions," J. ACM, Nov. 2020, [Online]. Available: http://arxiv.org/abs/2011.06801.

[7]. A. Nayebi and M. Vitelli, "GRUV: Algorithmic Music Generation using Recurrent Neural Networks." 2015. [Online]. Available: http://cs224d.stanford.edu/reports/NayebiAran.pdf.

[8]. M. Bretan, G. Weinberg, and L. Heck, "A Unit Selection Methodology for Music Generation Using Deep Neural Networks." arXiv, Dec. 12, 2016. Accessed: Feb. 05, 2023. [Online]. Available: http://arxiv.org/abs/1612.03789.

[9]. E. Waite, "Generating long-term structure in songs and stories." Web blog post. Magenta, 15 (4), [Online] Available: https://magenta.tensorflow.org/2016/07/15/lookback-rnn-attention-rnn, 2016

[10]. A. Roberts, J. Engel, C. Raffel, C. Hawthrone, and D. Eck, "A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music." arXiv, Nov. 11, 2019. Accessed: Jan. 31, 2023. [Online]. Available: http://arxiv.org/abs/1803.05428.

[11]. J. Jiang, G. G. Xia, D. B. Carlton C. N. Anderson, and R. H. Miyakawa, "Transformer VAE: A Hierarchical Model for Structure-Aware and Interpretable Music Representation Learning," in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, May 2020, pp. 516–520. doi: 10.1109/ICASSP40776.2020.9054554.

[12]. P. Salim, Gerardo M, and Sarria M., "Musical Composition with Stochastic Context-Free Grammars," presented at the In Proceedings of 8th Mexican International Conference on Artificial Intelligence, 2016. [Online]. Available: https://hal. inria.fr/hal-01257155. Accessed on 05 April 2021.

[13]. S. Lattner, M. Grachten, and G. Widmer, "Imposing Higher-Level Structure in Polyphonic Music Generation Using Convolutional Restricted Boltzmann Machines and Constraints," Journal of Creative Music Systems, vol. 2, no. 2, Mar. 2018, doi: 10.5920/jcms.2018.01.

[14]. D. Shuqi, J. Zeyu, C. Gomes, and R. B. Dannenberg, "Controllable deep melody generation via hierarchical music structure representation." arXiv, Sep. 01, 2021. Accessed: Feb. 24, 2023. [Online]. Available: http://arxiv.org/abs/2109.00663.

[15]. N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, "Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription." in 2012 ICML 29th International Conference on Machine Learning (ICML). Jun. 2012, doi: 10.1002/chem.201102611.

[16]. Z. Wang, Y. Zhang, Y. Zhang, J. Jiang, R. Yang and J. Zhao et al., "PIANOTREE VAE: Structured Representation Learning for Polyphonic Music." arXiv, Aug. 17, 2020. Accessed: Feb. 06, 2023. [Online]. Available: http://arxiv.org/abs/2008.07118.

[17]. G. Brunner, A. Konrad, Y. Wang, and R. Wattenhofer, "MIDI-VAE: Modeling Dynamics and Instrumentation of Music with Applications to Style Transfer." arXiv, Sep. 20, 2018. Accessed: Feb. 07, 2023. [Online]. Available: http://arxiv.org/abs/1809.07600.

[18]. H. W. Dong, W. Y. Hsiao, L. C. Yang, and Y. H. Yang, "MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment." arXiv, Nov. 24, 2017. Accessed: Jan. 31, 2023. [Online]. Available: http://arxiv.org/abs/1709.06298.

[19]. A. Valenti, A. Carta, and D. Bacciu, "Learning Style-Aware Symbolic Music Representations by Adversarial Autoencoders," in 2020 ECAI 24th European Conference on Artificial Intelligence (ECAI), Feb. 2020. doi: 10.48550/arXiv.2001.05494.

[20]. C. Jin et al., "A transformer generative adversarial network for multi‐track music generation," CAAI Trans on Intel Tech, vol. 7, no. 3, pp. 369–380, Sep. 2022, doi: 10.1049/cit2.12065.

[21]. L. Jiafeng et al., "Symphony Generation with Permutation Invariant Language Model." arXiv, Sep. 16, 2022. doi: 10.48550/arXiv.2205.05448.

[22]. E. R. Miranda, R. Yeung, A. Pearson, K. Meichanetzidis, and B. Coecke, "A Quantum Natural Language Processing Approach to Musical Intelligence." arXiv, Dec. 09, 2021. doi: 10.48550/arXiv.2111.06741.