Fusing Transformer-XL with bi-directional recurrent networks for cyberbullying detection

将Transformer-XL与双向循环神经网络融合用于网络欺凌检测

阅读:1

Abstract

Identifying cyberbullying in languages other than English presents distinct difficulties owing to linguistic subtleties and scarcity of annotated datasets. This article presents a new method for identifying cyberbullying in Bengali text data using the Kaggle dataset. This strategy combines Transformer-Extra Large (XL) with bi-directional recurrent neural networks (BiGRU-BiLSTM). Extensive data preparation was performed, including data cleaning, data analysis, and label encoding. Upsampling methods were used to handle imbalanced classes, and data augmentation enhanced the training dataset. We carried out tokenization of the text using a pre-trained tokenizer to capture semantic representations accurately. The model we presented, Transformer-XL-bidirectional gated recurrent units (BiGRU)-bidirectional long short-term memory (BiLSTM), which is called Fusion Transformer-XL, surpassed the performance of the baseline models, attaining an accuracy of 98.17% and an F1-score of 98.18%. Local interpretable model-agnostic explanation (LIME) text explanations were used to understand the reasoning behind the model's choices and performed the cross-dataset evaluation of the model using the English dataset. This helped improve the clarity and reliability of the proposed method. Furthermore, implementing k-fold cross-validation ensures our model's robustness and adaptability across diverse data categories. The results of our study demonstrate the effectiveness of combining Transformer-XL with bi-directional recurrent networks for detecting cyberbullying in Bengali. This emphasizes the significance of using hybrid architectures to address intricate natural language processing problems in languages with limited resources. This study enhances the development of methods for detecting cyberbullying and opens up opportunities for additional investigation into language diversity and social media analytics.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。