Abstract
BACKGROUND: Up to now, the underlying molecular mechanisms of benign prostatic hyperplasia (BPH) and prostate cancer (PCa) remain unclear. This study aimed to identify programmed cell death (PCD) associated genes in BPH and PCa development by integrated bioinformatics analysis and machine learning using publicly available genomic datasets. METHODS: The GSE119195 and GSE55597 datasets were downloaded from the Gene Expression Omnibus (GEO) database, and differentially expressed genes (DEGs) were obtained using the Limma package for differential expression analysis. The intersection of core genes was filtered using four machine learning methods [least absolute shrinkage and selection operator (LASSO) regression, eXtreme gradient boosting (XGBoost), random forest, and Boruta]. RESULTS: We identified 159 key genes from the intersection of two DEGs. Fifteen hub genes were obtained by intersecting 159 DEGs with PCD genes. Two hub genes [bone morphogenetic protein 5 (BMP5) and cytochrome p450 family 1 subfamily b member 1 (CYP1B1)] were ultimately chosen after further reducing the dimension of 15 hub genes using four machine learning techniques. The nomogram's findings showed that BMP5 and CYP1B1 together were a reliable indicator of the risk factors of PCa and BPH. Furthermore, the risk score in the GSE55597 dataset displayed an area under the curve (AUC) value of 0.988. Moreover, the risk score in the GSE119195 dataset displayed an AUC value of 1. In addition, PCa patients' prognosis was significantly correlated with CYP1B1, Gleason score, and tumor (T) in the tumor-node-metastasis (TNM) stage. CONCLUSIONS: We established a prediction model with a high predictive ability in the analyzed datasets. As a bioinformatics analysis, our study indicated that there are possible DEGs in the prostate, such as BMP5 and CYP1B1, which might provide further insight for the pathogenesis of BPH and PCa. However, these findings warrant further validation in prospective, real-world clinical studies.