Automated Multitier Tagging of Chinese Online Health Education Resources Using a Large Language Model: Development and Validation Study

基于大型语言模型的中文在线健康教育资源自动多层标注:开发与验证研究

阅读:1

Abstract

BACKGROUND: Precision health promotion, which aims to tailor health messages to individual needs, is hampered by the lack of structured metadata in vast digital health resource libraries. This bottleneck prevents scalable, personalized content delivery and exacerbates information overload for the public. OBJECTIVE: This study aimed to develop, deploy, and validate an automated tagging system using a large language model (LLM) to create the foundational metadata infrastructure required for tailored health communication at scale. METHODS: We developed a comprehensive, 3-tier health promotion taxonomy (10 primary, 34 secondary, and 90,562 tertiary tags) using a hybrid Delphi and corpus-mining methodology. We then constructed a hybrid inference pipeline by fine-tuning a Baichuan2-7B LLM with low-rank adaptation for initial tag generation. This was then refined by a domain-specific named entity recognition model and standardized against a vector database. The system's performance was evaluated against manual annotations from nonexpert staff on a test set of 1000 resources. We used a "no gold standard" framework, comparing the artificial intelligence-human (A-H) interrater reliability (IRR) with a supplemental human-human (H-H) IRR baseline and expert adjudication for cases where artificial intelligence provided additional tags ("AI Additive"). RESULTS: The A-H agreement was moderate (Cohen κ=0.54, 95% CI 0.53-0.56; Jaccard similarity coefficient=0.48, 95% CI 0.46-0.50). Critically, this was higher than the baseline nonexpert H-H agreement (Cohen κ=0.32, 95% CI 0.29-0.35; Jaccard similarity coefficient=0.35, 95% CI 0.27-0.43). A granular analysis of disagreements revealed that in 15.9% (159/1000) of the cases, the "AI Additive" tags were not identified by human annotators. Expert adjudication of these cases confirmed that the "AI Additive" tags were correct and relevant with a precision of 90% (45/50; 95% CI 78.2%-96.7%). CONCLUSIONS: A fine-tuned LLM, integrated into a hybrid pipeline, can function as a powerful augmentation tool for health content annotation. The system's consistency (A-H κ=0.54) was found to be superior to the baseline human workflow (H-H κ=0.32). By moving beyond simple automation to reliably identify relevant health topics missed by manual annotators with high, expert-validated accuracy, this study provides a robust technical and methodological blueprint for implementing artificial intelligence to enhance precision health communication in public health settings.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。