CancerSubminer: an integrated framework for cancer subtyping using supervised and unsupervised learning on DNA methylation profiles

CancerSubminer:一个利用DNA甲基化谱进行癌症亚型分类的综合框架,采用监督学习和非监督学习方法。

阅读:1

Abstract

Human cancer is highly heterogeneous, resulting in variable drug resistance and clinical outcomes. This complexity hinders accurate prognosis prediction and the development of targeted therapies. Molecular subtyping addresses these challenges by grouping cancers into more homogeneous subsets based on molecular characteristics, enabling subtype-specific treatment strategies. Subtyping is crucial for early diagnosis, personalized therapy, and improved survival by capturing differential therapeutic responses. Existing approaches to cancer subtyping fall into supervised and unsupervised categories. Supervised methods, often trained on The Cancer Genome Atlas (TCGA), rely on predefined subtype annotations but face limitations in generalizability and novel subtype discovery. Unsupervised methods, while capable of identifying new subtypes, may overlook widely recognized ones, hindering consistency with established classifications. Multi-omics approaches improve accuracy but are constrained by costs and data collection. We propose CancerSubminer, a hybrid subtyping framework that integrates supervised and unsupervised learning. A subtype classifier is first trained on labeled data, after which clustering is applied to extracted features, with low-confidence samples reassigned to refine subtype boundaries. Model is retrained with the refined subtypes, and adversarial training corrects batch effects and learns domain-invariant features across labeled TCGA and unlabeled external datasets. A subsequent semi-supervised fine-tuning phase aligns subtypes between datasets and designates low-confidence samples as potential novel candidates. CancerSubminer was evaluated on five cancer types, including breast, bladder, brain, kidney, and thyroid cancers, using TCGA methylation data with annotated subtypes and unlabeled datasets from the Gene Expression Omnibus. The framework outperformed state-of-the-art subtyping models (iClusterPlus, iClusterBayes, NEMO) and clustering methods (Spectral, K-means). Kaplan-Meier survival analysis demonstrated significant prognostic separation (p < 0.05) for all cancers, including thyroid cancer where predefined subtypes showed no significance but CancerSubminer-derived subtypes did. These findings highlight CancerSubminer's ability to identify distinct prognostic subtypes, mitigate batch effects, and improve prognostic stratification across heterogeneous datasets. CancerSubminer is publicly available at https://github.com/joungmin-choi/CancerSubminer .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。