A random forest classifier predicts recurrence risk in patients with ovarian cancer

随机森林分类器可预测卵巢癌患者的复发风险

阅读:1

Abstract

Ovarian cancer (OC) is associated with a poor prognosis due to difficulties in early detection. The aims of the present study were to construct a recurrence risk prediction model and to reveal important OC genes or pathways. RNA sequencing data was obtained for 307 OC samples, and the corresponding clinical data were downloaded from The Cancer Genome Atlas database. Additionally, two validation datasets, GSE44104 (20 recurrent and 40 non‑recurrent OC samples) and GSE49997 (204 OC samples), were obtained from the Gene Expression Omnibus database. Differentially expressed genes were screened using the differential expression via distance synthesis algorithm, followed by gene ontology enrichment analysis and weighted gene coexpression network analysis (WGCNA). Furthermore, subnetwork analysis was conducted for the protein‑protein interaction (PPI) network using the BioNet package. Finally, a random forest classifier was constructed based on the subnetwork nodes, and its reliability was validated using the GSE44104 and GSE49997 validation datasets. A total of 44 upregulated and 117 downregulated genes were identified in the recurrent samples. Enrichment analysis indicated that cytochrome P450 family 17 subfamily A member 1 (CYP17A1) was associated with 'positive regulation of steroid hormone biosynthetic processes'. WGCNA identified turquoise and grey modules that were significantly correlated with status and prognosis. A significant PPI subnetwork containing 16 nodes was also identified, including: Transcription factor GATA‑4; fibroblast growth factor 9; aromatase; 3β‑hydroxysteroid dehydrogenase/δ5‑4‑isomerase type 2; corticosteroid 11β‑dehydrogenase isozyme 1; CYP17A1; pituitary homeobox 2; left‑right determination factor 1; homeobox protein ARX; estrogen receptor β; steroidogenic factor 1; forkhead box protein L2; myocardin; steroidogenic acute regulatory protein mitochondrial; vesicular inhibitory amino acid transporter; and twist‑related protein 1. A random forest classifier was constructed using the subnetwork nodes as feature genes, which exhibited a 92% true positive rate when classifying recurrent and non‑recurrent OC samples. The classifying efficiency of the random forest classifier was validated using the two other independent datasets. Overall, 44 upregulated and 117 downregulated genes associated with OC recurrence were identified. Furthermore, the 16 subnetwork node genes that were identified may be important molecules in OC recurrence.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。