Machine learning discrimination of Gleason scores below GG3 and above GG4 for HSPC patients diagnosis

利用机器学习区分 Gleason 评分低于 GG3 和高于 GG4 的 HSPC 患者诊断

阅读:1

Abstract

This study aims to develop machine learning (ML)-assisted models for analyzing datasets related to Gleason scores in prostate cancer, conducting statistical analyses on the datasets, and identifying meaningful features. We retrospectively collected data from 717 hormone-sensitive prostate cancer (HSPC) patients at Yunnan Cancer Hospital. Of these, data from 526 patients were used for modeling. Seven auxiliary models were established using Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Extreme gradient boosting tree (XGBoost), Adaptive Boosting (Adaboost), and artificial neural network (ANN) based on 21 clinical biochemical indicators and features. Evaluation metrics included accuracy (ACC), precision (PRE), specificity (SPE), sensitivity (SEN) or regression rate(Recall), and f1 score. Evaluation metrics for the models primarily included ACC, PRE, SPE, SEN or Recall, f1 score, and area under the curve(AUC). Evaluation metrics were visualized using confusion matrices and ROC curves. Among the ensemble learning methods, RF, XGBoost, and Adaboost performed the best. RF achieved a training dataset score of 0.769 (95% CI: 0.759-0.835) and a testing dataset score of 0.755 (95% CI: 0.660-0.760) (AUC: 0.786, 95%CI: 0.722-0.803), while XGBoost achieved a training dataset score of 0.755 (95% CI: 95%CI: 0.711-0.809) and a testing dataset score of 0.745 (95% CI: 0.660-0.764) (AUC: 0.777, 95% CI: 0.726-0.798). Adaboost scored 0.789 on the training dataset (95% CI: 0.782-0.857) and 0.774 on the testing dataset (95% CI: 0.651-0.774) (AUC: 0.799, 95% CI: 0.703-0.802). In terms of feature importance (FI) in ensemble learning, Bone metastases at first visit, prostatic volume, age, and T1-T2 have significant proportions in RF's FI. fPSA, TPSA, and tumor burden have significant proportions in Adaboost's FI, while f/TPSA, LDH, and testosterone have the highest proportions in XGBoost. Our findings indicate that ensemble learning methods demonstrate good performance in classifying HSPC patient data, with TNM staging and fPSA being important classification indicators. These discoveries provide valuable references for distinguishing different Gleason scores, facilitating more accurate patient assessments and personalized treatment plans.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。