Machine learning approaches to optimize the integration of sociodemographic factors for predicting cancer-specific survival among patients with high-risk prostate cancer

利用机器学习方法优化社会人口因素的整合,以预测高危前列腺癌患者的癌症特异性生存率

阅读:1

Abstract

BACKGROUND: Sociodemographic factors influence the outcomes of prostate cancer (PCa); however, they are rarely incorporated into clinical risk prediction models. This study aimed to assess whether machine learning approaches could optimize the integration of sociodemographic variables to improve the prediction of cancer-specific survival among patients with high-risk PCa. MATERIALS AND METHODS: Data from the Surveillance, Epidemiology, and End Results database were retrospectively analyzed to identify patients diagnosed with high-risk PCa from 2010 to 2020. Two random forest models were developed: one using clinical and pathological variables (age, stage, prostate-specific antigen level, Gleason grade, time to treatment, and year of diagnosis) and another incorporating available sociodemographic features (race, income, marital status, region, and urbanicity). Five-fold cross-validation was performed to evaluate the model performance and minimize overfitting. Hyperparameter tuning via a grid search optimized the model structure. Performance was assessed using the area under the receiver operating characteristic curve (AUC), Brier scores, sensitivity, and specificity. Parallel analyses were conducted using the XGBoost software. Clinical utility was evaluated using decision curve analysis. RESULTS: We identified 80,858 patients with high-risk PCa. The clinical-only random forest model (AUC, 0.54) significantly improved with the addition of sociodemographic variables (AUC, 0.72; p < 0.001). The Brier score, sensitivity, and specificity were also superior in the combined model (all p < 0.001). Similar results were obtained for XGBoost. Gleason grade was the most predictive factor, whereas sociodemographic variables, particularly income and geographic region, were highly informative. Decision curve analysis demonstrated a higher net clinical benefit with the combined model. CONCLUSIONS: Incorporating sociodemographic variables into machine learning models significantly improved the prediction of cancer-specific survival in high-risk PCa, supporting their inclusion in risk stratification tools.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。