Detecting Adverse Drug Events in Social Media: A Brief Literature Review

社交媒体中药物不良事件的检测:简要文献综述

阅读:2

Abstract

Adverse drug events (ADEs) remain a significant burden to public health and a persistent challenge for pharmacovigilance. The proliferation of patient-generated discourse on social media offers a complementary, real-time signal for ADE surveillance. This article provides a concise yet comprehensive review of recent natural language processing (NLP) research on identifying ADEs in social media text. We systematically reviewed 100 peer-reviewed studies (2017-2025) on NLP/AI for detecting or analysing ADEs in social media. Searches in Google Scholar targeted English-language journal and conference papers; patents and protocols were excluded. Of 130 records screened, 6 were protocols and 24 were excluded because the full text could not be located or the item was a conference abstract lacking methodological detail (i.e., no description of approaches or experiments), yielding a final sample of 100 studies. One reviewer performed screening, with full-text eligibility verified by a second. We extracted objectives, data sources/languages, preprocessing and annotation practices, datasets, model families, evaluation metrics, and stated limitations. Studies were grouped into five task categories-classification, extraction, normalization, corpus creation, and broader analytical work-with evidence tables summarizing contributions, toolchains, datasets, and performance. Recurrent challenges include noisy/imbalanced data, multilingual and code-mixed content, and variability in annotation standards. Twitter remains the primary data source: 60% of studies analyse Twitter alone and a further 18% combine Twitter with other platforms (78% in total). English overwhelmingly dominates; only about 5% of studies draw on non-English sources (e.g., French, Chinese, Arabic). Standard pre-processing-URL removal, tokenisation, and lowercasing-is near-universal. Transformer-based models predominate, with BERT and its biomedical or "tweet" variants (e.g., RoBERTa, BioBERT, BERTweet) used in more than 60% of approaches. Persistent obstacles include severe class imbalance and ambiguous or implicit drug-event expressions. Although shared tasks such as SMM4H provide widely used benchmarks, comprehensive annotation guidelines remain uncommon (12% of papers). Recent work increasingly incorporates multimodal inputs and integrates structured biomedical knowledge, yet gaps persist in multilingual coverage, temporal/longitudinal modelling, and real-world deployment. To our knowledge, this is the first review to synthesise findings from a corpus of 100 peer-reviewed studies on ADE detection in social media using NLP. By organising the literature by task type and tracing methodological trends and limitations, it provides practical guidance for researchers and practitioners. The review also outlines actionable directions for future work, including model explainability, support for low-resource languages, and closer collaboration with regulatory authorities to enable real-world deployment.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。