Advancing deep learning for expressive music composition and performance modeling

推进深度学习在表现力强的音乐创作和表演建模中的应用

阅读:1

Abstract

The pursuit of expressive and human-like music generation remains a significant challenge in the field of artificial intelligence (AI). While deep learning has advanced AI music composition and transcription, current models often struggle with long-term structural coherence and emotional nuance. This study presents a comparative analysis of three leading deep learning architectures: Long Short-Term Memory (LSTM) networks, Transformer models, and Generative Adversarial Networks (GANs), for AI-generated music composition and transcription using the MAESTRO dataset. Our key innovation lies in the integration of a dual evaluation framework that combines objective metrics (perplexity, harmonic consistency, and rhythmic entropy) with subjective human evaluations via a Mean Opinion Score (MOS) study involving 50 listeners. The Transformer model achieved the best overall performance (perplexity: 2.87, harmonic consistency: 79.4%, MOS: 4.3), indicating its superior ability to produce musically rich and expressive outputs. However, human compositions remained highest in perceptual quality (MOS: 4.8). Our findings provide a benchmarking foundation for future AI music systems and emphasize the need for emotion-aware modeling, real-time human-AI collaboration, and reinforcement learning to bridge the gap between machine-generated and human-performed music.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。