Understanding Uncertainty in Large Language Model Predictions of Early Death in Critically Ill Patients: A Conformal Prediction Approach

理解大型语言模型对危重患者早期死亡预测的不确定性:一种保角预测方法

阅读:1

Abstract

BACKGROUND: Early prediction of in-hospital death remains a significant challenge due to the limited availability of structured data during initial admission. Unstructured clinical notes, which often contain important observations and impressions, are an underutilized resource for real-time risk stratification. While leveraging recent advances in large language models (LLM) is a promising approach to use this unstructured information, the lack of understanding of the uncertainty of LLM predictions, at the patient level, for such critical forecasts is a serious deterrence for their use in clinical settings. OBJECTIVE: This study aims to evaluate the effectiveness and confidence, in predicting in-hospital death probability for an individual patient using LLMs, specifically GPT-4o and unstructured clinical notes. METHODS: We applied conformal prediction to quantify the uncertainty of GPT-4o's zero-shot predictions for in-hospital death, leveraging concatenated clinical notes documented from the first 24 hours of intensive care unit (ICU) admission in MIMIC-III for patients with acute kidney failure who were admitted through the emergency department (ED). RESULTS: Across both classes "in-hospital death" and "in-hospital survive", the GPT model performed better on the in-hospital death class, achieving precision 0.52 (95% CI 0.48-0.56), recall 0.93 (95% CI 0.90-0.95), and F1-score 0.66 (95% CI 0.63-0.70). The conformal prediction (CP) framework provided an overall empirical coverage of 90.4%, exceeding the target threshold of 90%. However, class-specific coverage was imbalanced, with 99.7% for the death and 81.1% for the survived class. CONCLUSIONS: The model's outputs exhibit overconfidence, particularly in cases of incorrect predictions. Integrating conformal prediction provides a promising approach to quantifying and calibrating uncertainty in large language model outputs for individual patient predictions, thereby enhancing their potential applicability for clinical decision-making.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。