[Application of large language models in health education for patients with pediatric cataract]

[大型语言模型在儿童白内障患者健康教育中的应用]

阅读:1

Abstract

OBJECTIVES: Pediatric cataract occurs during the critical period of visual development, and early intervention is essential to avoid irreversible visual impairment. The health literacy and self-management ability of children and their parents directly affect treatment adherence and prognosis. With the rapid development of artificial intelligence, this study aims to evaluate the accuracy, completeness, and repeatability of domestic open-source large language model (LLM) in answering common clinical questions from pediatric cataract patients, and to explore their application potential as an online health information resource tool for pediatric cataract patients. METHODS: The research team collected real patient questions from mainstream online medical platforms since 2016, and categorized them into 5 major domains: Risk factors, disease diagnosis, symptoms and staging, screening and examinations, treatment and prognosis. After expert review, 40 high-attention questions were finalized and manual reference answers were provided by experts. Four domestic open-source LLM (Kimi chat, Doubao, ERNIE Bot 3.5, DeepSeek) were selected. Each question was asked repeatedly 4 times, including 2 times with a "patient-physician" role prompt. Three cataract specialists with the title of associate chief physician or above scored the answers blindly using a 4-level accuracy scale, 3-level completeness scale, and 3-level reproducibility scale. The evaluation followed a two-stage assessment scheme: Stage 1 preliminarily tested the 4 LLM using 6 questions of recognized lower difficulty; Stage 2 performed a full evaluation of all 40 questions on the highest-scoring LLM from Stage 1. RESULTS: In the first stage of evaluation, regardless of whether role prompts were included, among the 4 LLM, Kimi chat performed the best, followed by Doubao and ERNIE Bot 3.5, and finally DeepSeek. In Stage 1, regardless of role prompting, Kimi chat performed best, followed by Doubao and ERNIE Bot 3.5, with DeepSeek ranking last. The proportion of answers from Kimi chat scoring accuracy=4, completeness=3, and reproducibility=3 was higher than Doubao, ERNIE Bot 3.5, and DeepSeek. In Stage 2, Kimi chat completed all 40 questions. Its median answer length was 531 (277, 1 059) words, significantly higher than the manual reference 369 (162, 707) words (Z=-4.096, P<0.001). However, answer length showed no significant correlation with accuracy or completeness (both P>0.05). Across 240 model responses, the proportions were: accuracy ≥ 3: 83.8%, completeness=3: 77.9%, and repeatability≥70%: 66.7%. 62.1% (149/240) of evaluators selected Kimi chat answers as their top preference. Reasons for not selecting included off-topic responses, controversial suggestions, and redundant information. CONCLUSIONS: Domestic open-source LLM, especially Kimi chat, demonstrated relatively good performance in pediatric cataract health education scenarios, providing medical information with good accuracy, completeness, and reproducibility for parents. LLM have great potential in the healthcare field, but information security, hallucination, and bias remain key challenges, and they still cannot replace clinical physicians. In the future, LLM are expected to collaborate with physicians to deliver more efficient and personalized medical services and promote the development of healthcare.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。