BACKGROUND: Although different quality controls have been applied at different stages of the sample preparation and data analysis to ensure both reproducibility and reliability of RNA-seq results, there are still limitations and bias on the detectability for certain differentially expressed genes (DEGs). Whether the transcriptional dynamics of a gene can be captured accurately depends on experimental design/operation and the following data analysis processes. The workflow of subsequent data processing, such as reads alignment, transcript quantification, normalization, and statistical methods for ultimate identification of DEGs can influence the accuracy and sensitivity of DEGs analysis, producing a certain number of false-positivity or false-negativity. Machine learning (ML) is a multidisciplinary field that employs computer science, artificial intelligence, computational statistics and information theory to construct algorithms that can learn from existing data sets and to make predictions on new data set. ML-based differential network analysis has been applied to predict stress-responsive genes through learning the patterns of 32 expression characteristics of known stress-related genes. In addition, the epigenetic regulation plays critical roles in gene expression, therefore, DNA and histone methylation data has been shown to be powerful for ML-based model for prediction of gene expression in many systems, including lung cancer cells. Therefore, it is promising that ML-based methods could help to identify the DEGs that are not identified by traditional RNA-seq method. RESULTS: We identified the top 23 most informative features through assessing the performance of three different feature selection algorithms combined with five different classification methods on training and testing data sets. By comprehensive comparison, we found that the model based on InfoGain feature selection and Logistic Regression classification is powerful for DEGs prediction. Moreover, the power and performance of ML-based prediction was validated by the prediction on ethylene regulated gene expression and the following qRT-PCR. CONCLUSIONS: Our study shows that the combination of ML-based method with RNA-seq greatly improves the sensitivity of DEGs identification.
RNA-seq assistant: machine learning based methods to identify more transcriptional regulated genes.
阅读:11
作者:Wang Likai, Xi Yanpeng, Sung Sibum, Qiao Hong
| 期刊: | BMC Genomics | 影响因子: | 3.700 |
| 时间: | 2018 | 起止号: | 2018 Jul 20; 19(1):546 |
| doi: | 10.1186/s12864-018-4932-2 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
