Abstract
BACKGROUND: Lymph node metastasis serves as a crucial prognostic risk factor for patients with cervical cancer. Accurate prediction of lymph node metastasis is important in guiding treatment selection. Therefore, our primary objective is the development and validation of machine learning models for predicting lymph node metastasis; the secondary objective is to utilize the sequencing data to provide biological plausibility. METHODS: This study retrospectively included 292 cervical cancer patients and prospectively recruited 54 cervical cancer patients. Univariate and multivariate analysis were conducted to explore the risk factors associated with lymph node metastasis. Subsequently, cellular-level validation was performed using single cell RNA-sequencing data. The prognostic value of the risk factor was assessed through bulk RNA-sequencing analysis. Finally, patients were divided into train and retrospective test sets in a 7:3 ratio to develop five machine learning models, while using the prospective test set to validate the models. Additionally, the Shapley Additive Explanation method was employed to enhance the interpretability of the models' decision processes. RESULTS: Federation of Gynecology and Obstetrics stage (2018), squamous cell carcinoma antigen, monocyte count and platelet count were found to be significantly correlated with lymph node metastasis. Meanwhile, monocyte count was a significant risk factor (OR=2.28, p < 0.05). Single cell RNA-sequencing analysis revealed an increase in monocytes at IIIC1 stage compared to IB and IIB stages. Monocytes were significantly associated with prognosis and lymph node metastasis in the bulk RNA-sequencing. Finally, we developed and validated five machine learning models for predicting lymph node metastasis. The NNET model stood for its ability to predict lymph node metastasis (train set AUC: 0.86; retrospective test set AUC: 0.79; prospective test set: 0.76). In the interpretability of machine learning models, Shapley Additive Explanation values demonstrated the concrete contribution of each feature within the NNET model. CONCLUSIONS: This study investigated the notable association between monocyte count and lymph node metastasis, highlighting the importance of monocytes in cervical cancer via bulk RNA-sequencing and single cell RNA-sequencing analysis. The developed interpretable machine learning models effectively aid clinicians in decision-making processes. Additionally, the Shapley Additive Explanation method improved the applicability of these machine learning models in real world.