General Intelligence Framework to Predict Virus Adaptation Based on a Genome Language Model

基于基因组语言模型的病毒适应性预测通用智能框架

阅读:1

Abstract

Most human viral pandemics are caused by animal-originated viruses with human adaptation. It is challenging to infer adaptation from viral genes or their coded protein sequences, particularly when the data labels for modeling are inadequate or the input sequence to be predicted is incomplete. Here, we developed a semi-supervised General Intelligence framework to predict Virus Adaptation based on Language-model-embedded protein sequences (GIVAL) for blind input of virus sequences. The language model in GIVAL, named virus Bidirectional Encoder Representations from Transformers (vBERT), was pretrained for embedding using hidden Markov model-contextualized tokens of viral protein sequences. vBERT outperformed prevalent pretrained models like DNABERT-2, proteinBERT, ESM-2, Transformer, and Word2Vec on distinguishing viral proteins with various-grained labels, such as serotypes and single phenotype-altering mutation. The semi-supervised GIVAL obtained higher accuracy in virus adaptation prediction and better fault tolerance on raw labels in the training dataset, overcoming the obstacle of modeling with insufficient labels and predicting blind input. GIVAL was applicable to the adaptation prediction of diverse viruses. For influenza A viruses (IAVs), higher human adaptation was predicted for equine-origin H3N8 IAVs and bovine H5N1 IAVs with simulated mutations. For coronaviruses, GIVAL predicted an adaptation shift of receptor binding from Middle East respiratory syndrome-related coronavirus (MERS-CoV) receptor to severe acute respiratory syndrome coronavirus receptor of 2 recently reported MERS-CoV-like virus variants. For monkeypox viruses, GIVAL quantified an incremental adaptation shift of viral variants, matching the rise in human monkeypox cases. Summarily, GIVAL provides a generally intelligent framework for predicting virus adaptation based on its genotype, with the potential to extend to more genotype-to-phenotype prediction scenarios.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。