Extracting language information from clinical notes using large language models

利用大型语言模型从临床笔记中提取语言信息

阅读:1

Abstract

BACKGROUND: Patient language proficiency plays a critical role in equitable, patient-centered care and language-related clinical research. However, language information recorded in structured fields of electronic health records (EHRs) is often incomplete or inaccurate, especially in multi-institutional settings with heterogeneous documentation practices. OBJECTIVE: To develop and evaluate a named entity recognition (NER) pipeline that accurately extracts detailed patient language status from unstructured clinical notes using large language models (LLMs), thereby enabling scalable and generalizable language information extraction. METHODS: We defined four categories of language status-fluent use, partial ability, lack of understanding, and language mentions unrelated to the patient-and annotated two datasets from Yale New Haven Hospital (YNHH) and MIMIC-III. We evaluated the performance of proprietary and open-source LLMs, including GPT-4o, LLaMA3, and BERT, under zero-shot and fine-tuning settings. Cross-site validation was conducted to assess generalizability across institutions. RESULTS: GPT-4o achieved F1 scores of 87 % and 82 % on YNHH and MIMIC datasets, respectively, without fine-tuning. Fine-tuned open-source models such as BERT and LLaMA3 reached comparable or superior performance when trained on sufficient annotated data. Cross-institutional evaluations confirmed that LLMs, particularly LLaMA3, exhibited stronger generalizability than traditional models. Language mentions unrelated to patient fluency remained the most challenging category across all models. CONCLUSION: Our NER framework enables automated extraction of nuanced language information from clinical narratives with high accuracy and generalizability. This work supports large-scale, language-focused research and has practical implications for improving patient-provider communication, interpreter service allocation, and equitable healthcare delivery.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。