Abstract
Sentiment analysis using machine learning has become increasingly popular and has received considerable attention in recent years. The sentiment analysis is a critical and challenging task, which require networks with high accuracy. This study utilized the IMDb movie reviews dataset, which comprises 50,000 English reviews (25,000 designated for training and 25,000 for testing) with an equal distribution of positive and negative classes. The dataset's unique characteristics, such as spelling errors, varying text lengths, and abbreviations, necessitate a multi-phase and unconventional approach to sentiment analysis. The data was thoroughly preprocessed, which involved eliminating unwanted characters, correcting slang, removing stop words, tokenizing, stemming, and performing part-of-speech tagging. To achieve this, this research implemented two separate word embedding models, GloVe and Word2Vec, for vectorization. In this study, Echo State Network (ESN) has been utilized, as there are two sentiments to consider, including positive and negative. In the following, this network has been optimized using Augmented Water Cycle Algorithm (AWCA), thus enhancing the hyperparameters. It was demonstrated by the outcomes that using GloVe could help the suggested ESN-AWCA accomplish the values of 96.37%, 96.39%, 95.87%, and 96.87% for F1-score, accuracy, recall, and precision, respectively. Moreover, utilizing Word2Vec helped the suggested model accomplish the values of 96.23%, 96.12%, 95.76%, and 96.71% for F1-score, accuracy, recall, and precision, respectively. Overall, the proposed ESN-AWCA model demonstrated strong performance with both word embedding methods and outperformed the other models evaluated in the study. The statistical validation, the p value of 0.001 and effect sizes d > 1.1, demonstrated the superiority of the suggested model.