Discovery of potential biomarkers for lung cancer classification based on human proteome microarrays using Stochastic Gradient Boosting approach

基于随机梯度提升算法的人类蛋白质组微阵列肺癌分类潜在生物标志物发现

阅读:1

Abstract

PURPOSE: Early identification of lung cancer (LC) will considerably facilitate the intervention and prevention of LC. The human proteome micro-arrays approach can be used as a "liquid biopsy" to diagnose LC to complement conventional diagnosis, which needs advanced bioinformatics methods such as feature selection (FS) and refined machine learning models. METHODS: A two-stage FS methodology by infusing Pearson's Correlation (PC) with a univariate filter (SBF) or recursive feature elimination (RFE) was used to reduce the redundancy of the original dataset. The Stochastic Gradient Boosting (SGB), Random Forest (RF), and Support Vector Machine (SVM) techniques were applied to build ensemble classifiers based on four subsets. The synthetic minority oversampling technique (SMOTE) was used in the preprocessing of imbalanced data. RESULTS: FS approach with SBF and RFE extracted 25 and 55 features, respectively, with 14 overlapped ones. All three ensemble models demonstrate superior accuracy (ranging from 0.867 to 0.967) and sensitivity (0.917 to 1.00) in the test datasets with SGB of SBF subset outperforming others. The SMOTE technique has improved the model performance in the training process. Three of the top selected candidate biomarkers (LGR4, CDC34, and GHRHR) were highly suggested to play a role in lung tumorigenesis. CONCLUSION: A novel hybrid FS method with classical ensemble machine learning algorithms was first used in the classification of protein microarray data. The parsimony model constructed by the SGB algorithm with the appropriate FS and SMOTE approach performs well in the classification task with higher sensitivity and specificity. Standardization and innovation of bioinformatics approach for protein microarray analysis need further exploration and validation.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。