Abstract
BACKGROUND: The propagation of tobacco-related information that is inconsistent with public health guide significantly impacts public health, particularly affecting people with less access to reliable information sources (such as those with lower education), who may also suffer disproportionate tobacco-related morbidity and mortality. This study develops a multi-dimensional analytical framework for identifying and categorizing tobacco-related information on social media. Using a dataset of tweets, the framework was constructed through qualitative analysis, which was then compared with an exploratory, AI-assisted analysis to assess the capabilities of current automated tools. METHODS: A collection of 3.4 million tweets related to tobacco and nicotine was refined to 842,754 after removing irrelevant and duplicate posts. LDA topic modeling identified six unique topics, from which two randomly selected samples of tweets were drawn to perform qualitative analysis and AI-assisted analysis to identify categories of tobacco information. RESULTS: The identified tobacco-related information was categorized by three dimensions (1) content, including safety and health effects, cessation, substance, and policy; (2) type of falsehood, which included fabrication and unsubstantiated claims, misrepresentations, and distortions; and (3) source, ranging from individuals and retail stores to advocacy groups and influencers. A notable finding was the prevalence of policy-related discussions of tobacco information on Twitter (X), highlighting this often-overlooked domain. The controversy over vaping has amplified pro-vaping voices on social media, with content frequently misinterpreting scientific findings, policies, and expert opinions, reflecting more nuanced and difficult to recognize falsehood in the misleading content. CONCLUSION: This study offers a comprehensive framework for analyzing tobacco-related information on social media, emphasizing key issues in policy debates and the presence of conspiracy narratives. This framework can inform the design of interventions for less informed populations and enhance data annotation for machine learning tasks.