Potential and limitations of machine meta-learning (ensemble) methods for predicting COVID-19 mortality in a large inhospital Brazilian dataset

机器元学习（集成）方法在巴西大型住院数据集中预测 COVID-19 死亡率的潜力和局限性

阅读：26

作者：Bruno Barbosa Miranda de Paiva,Polianna Delfino Pereira,Claudio Moisés Valiense de Andrade,Virginia Mara Reis Gomes,Maira Viana Rego Souza-Silva,Karina Paula Medeiros Prado Martins,Thaís Lorenna Souza Sales,Rafael Lima Rodrigues de Carvalho,Magda Carvalho Pires,Lucas Emanuel Ferreira Ramos

期刊：	Scientific Reports	影响因子：	3.800
时间：	2023	起止号：	2023 Mar 1;13(1):3463.
doi：	10.1038/s41598-023-28579-z	研究方向：	免疫
疾病类型：	新冠

Abstract

The majority of early prediction scores and methods to predict COVID-19 mortality are bound by methodological flaws and technological limitations (e.g., the use of a single prediction model). Our aim is to provide a thorough comparative study that tackles those methodological issues, considering multiple techniques to build mortality prediction models, including modern machine learning (neural) algorithms and traditional statistical techniques, as well as meta-learning (ensemble) approaches. This study used a dataset from a multicenter cohort of 10,897 adult Brazilian COVID-19 patients, admitted from March/2020 to November/2021, including patients [median age 60 (interquartile range 48-71), 46% women]. We also proposed new original population-based meta-features that have not been devised in the literature. Stacking has shown to achieve the best results reported in the literature for the death prediction task, improving over previous state-of-the-art by more than 46% in Recall for predicting death, with AUROC 0.826 and MacroF1 of 65.4%. The newly proposed meta-features were highly discriminative of death, but fell short in producing large improvements in final prediction performance, demonstrating that we are possibly on the limits of the prediction capabilities that can be achieved with the current set of ML techniques and (meta-)features. Finally, we investigated how the trained models perform on different hospitals, showing that there are indeed large differences in classifier performance between different hospitals, further making the case that errors are produced by factors that cannot be modeled with the current predictors.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用；引用内容仅为补充信息，不代表本站立场。

2、若认为本页面引用内容涉及侵权，请及时与本站联系，我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容，需注明“来源：[生知库]”并获得授权；使用引用内容的，需自行联系原作者获得许可。

4、投稿及合作请联系：info@biocloudy.com。