Evaluating sentiment analysis models: A comparative analysis of vaccination tweets during the COVID-19 phase leveraging DistilBERT for enhanced insights

评估情感分析模型:利用 DistilBERT 对 COVID-19 疫情期间的疫苗接种推文进行比较分析,以增强洞察力

阅读:1

Abstract

This study investigates public sentiment toward COVID-19 vaccinations by analyzing Twitter data using advanced machine learning (ML) and natural language processing (NLP) techniques. Recognizing social media as a valuable source for gauging public opinion during health crises, the research aims to inform policies on content moderation and misinformation control.•Comparative Analysis of Embedding Techniques and ML Models: The study evaluates two embedding techniques-TF-IDF and Word2Vec-across five ML models: LinearSVC, Random Forest, Gradient Boosting Machine (GBM), XGBoost, and AdaBoost.•The models were tested using two training-testing splits (70-30 and 80-20) to assess their performance on noisy, unlabeled, and imbalanced sentiment data.•Utilization of DistilBERT for Pseudo-Labeling: To enhance labeling accuracy, DistilBERT was employed for pseudo-labeling, capturing semantic nuances often missed by traditional ML techniques. This approach enabled more effective sentiment classification of tweets. The findings underscore the effectiveness of automated annotation, hybrid modeling, and embedding strategies in analyzing unstructured social media data. Such approaches provide valuable insights for public health applications, particularly in understanding vaccine hesitancy and shaping communication strategies. The study highlights the potential of integrating advanced NLP techniques to better comprehend and respond to public sentiments during pandemics or similar emergencies.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。