As social media platforms evolve, hate speech increasingly manifests across multiple modalities, including text, images, audio, and video, challenging traditional detection systems focused on single modalities. Hence, this research proposes a novel Multi-modal Hate Speech Detection Framework (MHSDF) that combines Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) to analyze complex, heterogeneous data streams. This hybrid approach leverages CNNs for spatial feature extraction, such as identifying visual cues in images and local text patterns, and Long Short Term Memory (LSTM) for modeling temporal dependencies and sequential information in text and audio. For textual content, utilize state-of-the-art word embeddings, including Word2Vec and BERT, to capture semantic relationships and contextual nuances. The framework integrates CNNs to extract n-gram patterns and RNNs to model long-range dependency up to sequences of up to 100 tokens. CNNs extract key spatial features in visual tasks, while LSTMs process video sequences to capture evolving visual patterns. Image spatial features refer to object localization, color distributions, and text extracted via Optical Character Recognition (OCR). The fusion mechanism employs attention mechanisms to prioritize key interactions between modalities, enabling the detection of nuanced hate speech, such as memes that blend offensive imagery with implicit text, sarcastic videos where toxicity is conveyed through tone and facial expressions, and multi-layered content that embeds discriminatory meaning, across different formats. The numerical findings show that the proposed MHSDF model increases the detection accuracy ratio of 98.53%, robustness ratio of 97.64%, interpretability ratio of 97.71%, scalability ratio of 98.67%, and performance ratio of 99.21% compared to other existing models. Furthermore, the model's interpretability is enhanced through attention-based explanations, which provide insights into how multi-modal hate speech is identified. The framework improves traceability of decisions, interpretability by modality, and overall transparency.
A comprehensive framework for multi-modal hate speech detection in social media using deep learning.
阅读:3
作者:Prabhu R, Seethalakshmi V
| 期刊: | Scientific Reports | 影响因子: | 3.900 |
| 时间: | 2025 | 起止号: | 2025 Apr 15; 15(1):13020 |
| doi: | 10.1038/s41598-025-94069-z | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
