Streamlining Ophthalmic Documentation With Anonymized, Fine-Tuned Language Models: Feasibility Study

利用匿名化、精细化的语言模型简化眼科文档:可行性研究

阅读:1

Abstract

BACKGROUND: The growing administrative burden on clinicians, particularly in medical documentation, contributes to burnout and may compromise patient safety. Recent advancements in generative artificial intelligence (AI) offer a promising solution to improve documentation processes and address these challenges. OBJECTIVE: This study aims to evaluate the feasibility of using a fine-tuned OpenAI Curie model to automate the generation of medical report summaries (epicrises) in ophthalmology. By assessing the model's performance through human and automated evaluations, this study seeks to determine its potential for reducing clinician workload while ensuring accuracy, usefulness, and compliance with regulatory requirements. METHODS: A data set of around 60,000 anonymized medical letters was created using a custom algorithm to comply with General Data Protection Regulation guidelines. The Curie model was fine-tuned on this data set to generate epicrises from medical histories, diagnoses, and findings. The performance evaluation involved various human assessments and automated evaluations from 2 large language models (LLMs). RESULTS: In the clinical context, 49.9% (384/769) of epicrises were evaluated as helpful or excellent, whereas only 25% (194/769) were considered disturbing. In a human (manual) evaluation, formal correctness was rated significantly higher than the neutral midpoint of 2.5 on the 4-point rating scale, as determined by a 1-sample Wilcoxon signed-rank test (mean 3.59, SD 0.85; W=1686; P<.001). Using paired t tests, we found a significant reduction in time, as correcting an AI epicrisis was faster than manually writing one (mean 109.52, SD 53.30 vs mean 54.25, SD 63.34 s; t(68)=3.39; P<.01). While medical accuracy and usefulness showed positive trends, these did not reach statistical significance when compared to the neutral midpoint (for medical accuracy, W= 7456; P=.08), for usefulness, W=7652.5; P=.18). Epicrises generated or corrected with AI were significantly shorter than manually written ones (mean 330.43, SD 115.42 vs mean 501.07, SD 243.50 characters; t(68)=-6.10; P<.001). Automated LLM assessments showed alignment with human ratings, with over 52% (356/679) and 66% (489/743) of responses in the top agreement categories, respectively. This supports overall consistency, though the comparison remains a proof of concept given methodological limitations. CONCLUSIONS: Our study demonstrates the technical and practical feasibility of introducing fine-tuned commercial LLMs into clinical practice. The AI-generated epicrises were formally and clinically correct in many cases and showed time-saving potential. While medical accuracy and usefulness varied across cases and should be focused on in further developments, a significant workload reduction is likely. Our anonymization process showed that regulatory challenges in the context of AI with patient data can effectively be dealt with. In summary, this study highlights the promise of transformer-based LLMs in reducing administrative tasks in health care. It outlines a pipeline for integrating LLMs into European Union clinical practice, emphasizing the need for careful implementation to ensure efficiency and patient safety.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。