Abstract
Background: Survival analysis is essential for studying time-to-event outcomes and providing a dynamic understanding of the probability of an event occurring over time. Various survival analysis techniques, from traditional statistical models to state-of-the-art machine learning algorithms, support healthcare intervention and policy decisions. However, there remains ongoing discussion about their comparative performance. Methods: We conducted a comparative study of several survival analysis methods, including the accelerated failure time, Cox proportional hazards (CoxPH), stepwise CoxPH, elastic net penalized Cox model, random survival forests, gradient boosting machine learning, AutoScore-Survival, DeepSurv, time-dependent Cox model based on neural network, and DeepHit survival neural network. We applied the concordance index (C-index) for model discrimination, and the integrated Brier scores (IBSs) for calibration, and considered the model interpretability. The prediction performance was independently evaluated in the inpatient dataset of Singapore General Hospital (SGH) from 2017 to 2019 and Asian patients from the MIMIC-IV Clinical Database (MIMIC-IV). The outcome was to predict 90-d all-cause mortality based on patient demographics, clinicopathological features, and historical data. Results: The results of the C-index indicate that deep learning achieved comparable performance, with DeepSurv producing the best discrimination in both SGH (C-index: 0.893) and MIMIC-IV (C-index: 0.794). The calibration of DeepSurv also performed the best, with the IBS of 0.0406 in SGH and 0.1473 in MIMIC-IV, all using the full variables. Moreover, AutoScore-Survival, using a minimal variable subset, is easy to interpret and can achieve good discrimination (C-index in SGH: 0.867; MIMIC-IV: 0.788) and calibration (IBS in SGH: 0.0439; MIMIC-IV: 0.1263). Conclusion: All survival models were satisfactory in predicting mortality after hospital admission. This study provides recommendations for selection based on the characteristics of different models.