Omics-CNN: A comprehensive pipeline for predictive analytics in quantitative omics using one-dimensional convolutional neural networks

Omics-CNN:一种利用一维卷积神经网络进行定量组学预测分析的综合流程

阅读:2

Abstract

BACKGROUND AND OBJECTIVE: The development of machine learning-based models that can be used for the prediction of severe diseases has been one of the main concerns of the scientific community. The current study seeks to expand a highly sophisticated tool, the Convolutional Neural Networks, making it applicable in multidimensional omics data classification problems and testing the newly introduced method on publicly available transcriptomics and proteomics data. METHODS: In this study, we introduce Omics-CNN, a Convolutional Neural Network-based pipeline, which couples Convolutional Neural Networks with dimensionality reduction, preprocessing, clustering, and explainability techniques to make them suitable to build highly accurate and interpretable classification models from high-throughput omics data. The developed tool has the potential to classify patients depending on the expression of genetic and clinical factors and identify features that can act as diagnostic biomarkers. Regarding dimensionality reduction, univariate and multivariate techniques were explored and compared. Gradient Weighted Class Activation Mapping analysis was performed to determine the most important features in the classification of the samples after training the model. RESULTS: The newly introduced pipeline was applied to one transcriptomics and one proteomics dataset for the identification of diagnostic models and biosignatures for Ischemic Stroke (IS) and COVID-19 infection, reporting highly accurate biosignatures with accuracies of 96 % and 95.41 %, respectively. Meanwhile, classification models based solely on a small part of attributes provided lower predictive accuracy, but identified compact transcript biosignature (KRT15, VPRBP, TNFRSF4, GORASP2) for Ischemic Stroke and protein biosignature (ADGRB3, VNN2, AGER, CIAPIN1) for Covid-19 infection diagnosis, respectively. CONCLUSIONS: Omics-CNN, overcame the inherent problems of applying Convolutional Neural Networks for the training diagnostic models with quantitative omics data, outperforming previous models of machine learning developed using the same datasets for Ischemic Stroke and Covid-19 infection diagnosis, determining the most contributing biomarkers for both diseases.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。