Abstract
With the primary objective of creating playlists that suggest songs, interest in music genre categorization has grown thanks to high-tech multimedia tools. To develop a strong music classifier that can quickly classify unlabeled music and enhance consumers' experiences with media players and music files, machine learning and deep learning ideas are required. This study presents a unique method that blends convolutional neural network (CNN) models as an ensemble system to detect musical genres. The method makes use of discrete wavelet transform (DWT), mel frequency cepstral coefficients (MFCC), and short-time fourier transform (STFT) characteristics to provide a comprehensive framework for expressing stylistic qualities in music. To do this, each model's hyperparameters are generated using the capuchin search algorithm (CapSA). Preprocessing the original signals, feature description utilizing DWT, MFCC, and STFT signal matrices, CNN model optimization to extract signal features, and music genre identification based on combined features make up the four main components of the technique. By integrating many signal processing techniques and CNN models, this study advances the field of music genre classification and provides possible insights into the blending of diverse musical components for improved classification accuracy. The GTZAN and Extended-Ballroom datasets were the two used in the studies. The average classification accuracy of 96.07 and 96.20 for each database, respectively, show how well our suggested strategy performs when compared to earlier, comparable methods.