LCD benchmark: long clinical document benchmark on mortality prediction for language models

LCD基准测试:用于语言模型的长临床文档死亡率预测基准测试

阅读:1

Abstract

OBJECTIVES: The application of natural language processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent. MATERIALS AND METHODS: To address this issue, we propose Long Clinical Document (LCD) benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of Medical Information Mart for Intensive Care IV and statewide death data. We evaluated this benchmark dataset using baseline models, from bag-of-words and convolutional neural network to instruction-tuned large language models. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations. RESULTS: Baseline models showed 28.9% for best-performing supervised models and 32.2% for GPT-4 in F1 metrics. Notes in our dataset have a median word count of 1687. DISCUSSION: Our analysis of the model outputs showed that our dataset is challenging for both models and human experts, but the models can find meaningful signals from the text. CONCLUSION: We expect our LCD benchmark to be a resource for the development of advanced supervised models, or prompting methods, tailored for clinical text.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。