Human–AI Collaboration for Automated Classification of Online Information Sources on Cervical Cancer

人机协作实现宫颈癌在线信息源的自动分类

阅读:1

Abstract

BACKGROUND: The landscape of cancer information has expanded across diverse online platforms. However, traditional methods such as manual coding are limited in their ability to efficiently identify information sources in large-scale datasets. This study introduces a novel approach that employs prompt engineering to automatically and thematically classify sources of online information on cervical cancer. METHODS: We identified 1,877 Korean online communities-referred to as “cafés"-that provide cervical cancer information. An initial codebook was developed using a zero-shot approach with GPT-4o. Two human coders then reviewed a sample of 500 cafés and iteratively added categories until reaching theoretical saturation, thereby refining the initial codebook. To validate the finalized version, which consisted of twelve categories, a separate sample of 200 cafés was independently coded by two coders (Cohen's kappa = 0.82; 95% CI [0.76-0.88]). We then structured a prompt for the automated classification of the full dataset. RESULTS: The prompts followed a step-by-step structure consisting of (1) main keywords for classification and (2) specific instructions. An initial prompt was applied to GPT-4o and demonstrated acceptable agreement with human coders (Krippendorff's α = 0.84, 95% CI [0.82-0.86] for the full dataset). The finalized prompt that contained additional detailed instructions was applied to GPT-4o and Gemini 1.5 Pro. The results demonstrated a substantial agreement among human coders, GPT-4o, and Gemini 1.5 Pro (Krippendorff's α = 0.81; 95% CI [0.80-0.83]). CONCLUSIONS: This study highlights the potential of human-AI collaboration in large-scale thematic classification. By integrating the efficiency of AI with human oversight, the proposed approach enhances both methodological validity and interpretive reliability. It offers a scalable pathway for future research in public health, infodemiology, and health communication. KEY MESSAGES: • Generative AI models (GPT-4o and Gemini 1.5 Pro) can reliably replicate human coding judgments in multi-category classification tasks when guided by a structured prompt. • Human–AI collaboration effectively supports the identification of key information sources in cancer infodemiology by combining AI efficiency with human oversight.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。