Large language models for structured cardiovascular data extraction: a foundation for scalable research and clinical applications

用于结构化心血管数据提取的大型语言模型:可扩展研究和临床应用的基础

阅读:2

Abstract

AIMS: Automated extraction of information from cardiac reports would benefit both clinical reporting and research. Large language models (LLMs) hold promise for such automation, but their clinical performance and practical implementation across various computational environments remain unclear. This study aims to evaluate the feasibility and performance of LLM-based classification of echocardiogram and invasive coronary angiography reports, using real-world clinical data across local, high-performance computing and cloud-based platforms. METHODS AND RESULTS: The angiography and echocardiography reports of 1000 patients, admitted with acute coronary syndrome, were labelled for multiple key diagnostic elements, including left ventricular function (LVF), culprit vessel, and acute occlusions. Report classification models were developed using LLMs via (i) prompt-based and (ii) fine-tuning approaches. Performance was assessed across different model types and compute infrastructures, with attention to class imbalance, ambiguous label annotations, and implementation costs. Large language models demonstrated strong performance in extracting structured diagnostic information from cardiac reports. Cloud-based models (such as GPT-4o) achieved the highest accuracy (0.87 for culprit vessel and 1.0 for LVF) and generalizability, but also smaller models run on a local high-performance cluster achieved reasonable accuracy, especially for less complex tasks (0.634 for culprit vessel and 0.984 for LVF). Classification was feasible with minimal pre-processing, enabling potential integration into electronic health record systems or research pipelines. Class imbalance, reflective of real-world prevalence, had a greater impact on fine-tuning approaches. CONCLUSION: Large language models can reliably classify structured cardiology reports across diverse computed infrastructures. Their accuracy and adaptability support their use in clinical and research settings, particularly for scalable report structuring and dataset generation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。