Uncertainty aware training to improve deep learning model calibration for classification of cardiac MR images

不确定性感知训练以提高深度学习模型校准精度,用于心脏磁共振图像分类

阅读:2

Abstract

Quantifying uncertainty of predictions has been identified as one way to develop more trustworthy artificial intelligence (AI) models beyond conventional reporting of performance metrics. When considering their role in a clinical decision support setting, AI classification models should ideally avoid confident wrong predictions and maximise the confidence of correct predictions. Models that do this are said to be well calibrated with regard to confidence. However, relatively little attention has been paid to how to improve calibration when training these models, i.e. to make the training strategy uncertainty-aware. In this work we: (i) evaluate three novel uncertainty-aware training strategies with regard to a range of accuracy and calibration performance measures, comparing against two state-of-the-art approaches, (ii) quantify the data (aleatoric) and model (epistemic) uncertainty of all models and (iii) evaluate the impact of using a model calibration measure for model selection in uncertainty-aware training, in contrast to the normal accuracy-based measures. We perform our analysis using two different clinical applications: cardiac resynchronisation therapy (CRT) response prediction and coronary artery disease (CAD) diagnosis from cardiac magnetic resonance (CMR) images. The best-performing model in terms of both classification accuracy and the most common calibration measure, expected calibration error (ECE) was the Confidence Weight method, a novel approach that weights the loss of samples to explicitly penalise confident incorrect predictions. The method reduced the ECE by 17% for CRT response prediction and by 22% for CAD diagnosis when compared to a baseline classifier in which no uncertainty-aware strategy was included. In both applications, as well as reducing the ECE there was a slight increase in accuracy from 69% to 70% and 70% to 72% for CRT response prediction and CAD diagnosis respectively. However, our analysis showed a lack of consistency in terms of optimal models when using different calibration measures. This indicates the need for careful consideration of performance metrics when training and selecting models for complex high risk applications in healthcare.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。