Abstract
BACKGROUND: Accurately predicting the survival outcomes of patients with lung cancer receiving chemotherapy remains challenging. OBJECTIVE: To improve clinical management of this population, this study developed a multivariate machine learning (ML) model to assess all-cause mortality risk in chemotherapy-treated patients with lung cancer. METHODS: This study retrospectively recruited 1278 postchemotherapy patients with lung cancer from Guangzhou Chest Hospital between 2017 and 2019. Candidate features such as demographic characteristics, environmental exposures, clinical information, and patient-reported symptoms were collected via questionnaires and the electronic medical record system. The survival status and the deceased date were investigated twice a year. A total of 84 predictive models were constructed on the training set using 5 ML algorithms either individually or in pairwise combinations. The concordance index was used to identify the optimal model on the testing set, with performance validated via receiver operating characteristic curves, calibration curves, and decision curve analysis. Additionally, Shapley Additive Explanations and restricted cubic splines were applied for feature attribution analysis. RESULTS: The optimal model ultimately retained 21 prognosis-association features, including age, sex, BMI, smoking status, environmental smoke, the MD Anderson Symptom Inventory for Lung Cancer total score trajectories, cluster of differentiation 56, TNM stage, histology, and prechemotherapy blood biomarkers. On the testing set, the model acquired a concordance index of 0.702 (95% CI 0.652-0.753). The decision curves demonstrated positive clinical benefit when the risk thresholds were 0.40-0.69, 0.62-0.99, and 0.72-0.99 for 1-, 3-, and 5-year mortality predictions, respectively. The calibration curves showed that the predicted mortality probabilities fluctuated around the observed probabilities, and the Brier scores for 1-, 3-, and 5-year predictions were 0.20, 0.18, and 0.11, respectively. The area under the curve of the model was 0.740, 0.777, and 0.915 for 1-, 3-, and 5-year mortality predictions, respectively. Interpretability feature attribution analysis revealed that the significant features could predict all-cause mortality risk in chemotherapy-treated patients with lung cancer. CONCLUSIONS: Our ML models exhibited acceptable discrimination, calibration, and clinical benefit in predicting the mortality risk of chemotherapy-treated patients with lung cancer, which could help clinicians in personalized prognostic management.