A multi-task CNN learning model for taxonomic assignment of human viruses

一种用于人类病毒分类的多任务卷积神经网络学习模型

阅读:1

Abstract

BACKGROUND: Taxonomic assignment is a key step in the identification of human viral pathogens. Current tools for taxonomic assignment from sequencing reads based on alignment or alignment-free k-mer approaches may not perform optimally in cases where the sequences diverge significantly from the reference sequences. Furthermore, many tools may not incorporate the genomic coverage of assigned reads as part of overall likelihood of a correct taxonomic assignment for a sample. RESULTS: In this paper, we describe the development of a pipeline that incorporates a multi-task learning model based on convolutional neural network (MT-CNN) and a Bayesian ranking approach to identify and rank the most likely human virus from sequence reads. For taxonomic assignment of reads, the MT-CNN model outperformed Kraken 2, Centrifuge, and Bowtie 2 on reads generated from simulated divergent HIV-1 genomes and was more sensitive in identifying SARS as the closest relation in four RNA sequencing datasets for SARS-CoV-2 virus. For genomic region assignment of assigned reads, the MT-CNN model performed competitively compared with Bowtie 2 and the region assignments were used for estimation of genomic coverage that was incorporated into a naïve Bayesian network together with the proportion of taxonomic assignments to rank the likelihood of candidate human viruses from sequence data. CONCLUSIONS: We have developed a pipeline that combines a novel MT-CNN model that is able to identify viruses with divergent sequences together with assignment of the genomic region, with a Bayesian approach to ranking of taxonomic assignments by taking into account both the number of assigned reads and genomic coverage. The pipeline is available at GitHub via https://github.com/MaHaoran627/CNN_Virus .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。