Abstract
Predictive models for determining coronavirus disease 2019 (COVID-19) severity have been established; however, the complexity of the interactions among factors limits the use of conventional statistical methods. This study aimed to establish a simple and accurate predictive model for COVID-19 severity using an explainable machine learning approach. A total of 3,301 patients ≥ 18 years diagnosed with COVID-19 between February 2020 and October 2022 were included. The discovery cohort comprised patients whose disease onset fell before October 1, 2020 (N = 1,023), and the validation cohort comprised the remaining patients (N = 2,278). Pointwise linear and logistic regression models were used to extract 41 features. Reinforcement learning was used to generate a simple model with high predictive accuracy. The primary evaluation was the area under the receiver operating characteristic curve (AUC). The predictive model achieved an AUC of ≥ 0.905 using four features: serum albumin levels, lactate dehydrogenase levels, age, and neutrophil count. The highest AUC value was 0.906 (sensitivity, 0.842; specificity, 0.811) in the discovery cohort and 0.861 (sensitivity, 0.804; specificity, 0.675) in the validation cohort. Simple and well-structured predictive models were established, which may aid in patient management and the selection of therapeutic interventions.