Prediction of renal cell carcinoma: Development and validation of machine learning model

肾细胞癌预测:机器学习模型的开发与验证

阅读:3

Abstract

Renal cell carcinoma (RCC) is the leading cause of urinary system morbidity and mortality. Early identification is crucial for improving RCC patient outcomes. This study aims to construct and validate an RCC prediction model for at-risk individuals using machine learning (ML) based on routine clinical data. Data from the Quanzhou First Hospital Affiliated with Fujian Medical University between March 2014 and March 2024 were retrospectively collected, with 70% randomly assigned to the training cohort and 30% to the validation cohort. Univariate and hierarchical clustering methods were employed to identify discriminatory features to enable optimal ML algorithm selection. The performance of 7 kinds of ML algorithms-based models was evaluated based on sensitivity (recall), accuracy, F1-score, area under the receiver operating curve (AUC), discrimination, calibration, and clinical net benefit. The algorithm achieving the best AUC was selected for combination with recursive feature elimination to identify features that maximize model performance and stability. After that, the RCC prediction model was finally constructed, and the Shapley Additive Explanations method was used to visualize model characteristics and individual case predictions. Among those algorithms, the eXtreme Gradient Boosting algorithm achieving the best performance was selected for final construction. Combined with the recursive feature elimination method, it identified 21 clinically relevant variables, including age, total protein, albumin, total bilirubin, alanine aminotransferase, alkaline phosphatase, gamma-glutamyl transpeptidase, glucose, lactate dehydrogenase, creatine kinase-MB, creatinine, potassium-chloride ratio, sodium ion, calcium ion, eosinophil count, hemoglobin, platelet count, Systemic Immune-Inflammation Index, Pan-Immune-Inflammation Value, platelet-lymphocyte ratio, and sodium-chloride ratio for RCC model construction. Subsequently, a RCC prediction model and eXtreme Gradient Boosting using these 21 variables was built, achieving AUC of 0.955 (95% CI: 0.938-0.976) and an average precision of 0.923 in the validation cohort. The additional calibration curve showed high agreement between predicted and observed risks. Finally, the Shapley Additive Explanations method well demonstrated the importance of all model features and provided case-specific interpretation for clinicians. We developed and validated an ML model using routine clinical data for large-scale RCC screening. This cost-effective approach facilitates the early detection of and intervention for RCC, which may lead to improved clinical outcomes.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。