Evaluating the Effectiveness of Generative AI for the Creation of Patient Education Materials on Coronary Heart Disease: A Comparative Study

评估生成式人工智能在创建冠心病患者教育材料方面的有效性:一项比较研究

阅读:1

Abstract

BACKGROUND: Generative artificial intelligence (AI) has shown great potential in various fields, including health care. However, its application for developing patient education materials (PEMs), particularly for those with coronary heart disease (CHD), remains underexplored. Traditional methods for creating these materials are time-consuming and lack personalization, which limit their effectiveness. OBJECTIVE: This study aims to explore the effectiveness of generative AI tools (ChatGPT and DeepSeek) at generating PEMs for patients with CHD and to compare them with materials developed by a professional medical team. METHODS: In February 2025, PEMs for patients with CHD were developed using a framework designed by a professional medical team. Structured prompts were used to generate materials through 2 generative AI models-ChatGPT-4o and DeepSeek R1. These AI-generated materials were compared with those created by the medical team in terms of development time, readability, understandability, actionability, and accuracy. RESULTS: The total time for manual preparation was 14 hours, while ChatGPT and DeepSeek consumed 0.62 hours and 0.78 hours, respectively. Regarding readability, the frequency of difficult words was more variable in manually written and ChatGPT materials, while DeepSeek showed more consistency. The proportion of simple sentences was highest with DeepSeek, followed by ChatGPT, with complete separation between manually written and ChatGPT (δ=1). Content word frequency was highest in manually written PEMs, while ChatGPT had the lowest but most stable values. Personal pronouns were most frequently used in manually written PEMs, with high variability, and least used in DeepSeek, which was stable. All 3 methods had similar readability levels and reached Chinese elementary school-level readability for the proportions of simple sentences and personal pronouns, with high school-level difficulty of words and content word frequency. The understandability and actionability scores were above 70, with ChatGPT being more stable for understandability and DeepSeek being more stable for actionability. No significant differences were found between groups. In terms of accuracy, intergroup comparisons showed significant differences (H=7.27, P=.03) but no significant differences in multiple comparisons. The direct comparison between ChatGPT and DeepSeek showed a negligible effect size (δ=0.02), with no significant difference (z-score=-0.06, P=.96). Accuracy issues in the AI-generated materials were noted by 4 of 8 experts. CONCLUSIONS: Generative AI significantly improved the efficiency of developing PEMs for patients with CHD. The materials generated by ChatGPT-4o and DeepSeek R1 were comparable to the professionally written ones in terms of readability, understandability, and actionability. However, improvements related to reducing the number of difficult words and increasing content word frequency are needed to enhance readability. The accuracy of AI-generated materials still poses concerns, including potential AI "hallucinations," and requires review by health care professionals. Generative AI holds considerable potential for generating PEMs, and future research should assess its applicability and effectiveness in real-world patient and family contexts.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。