Modeling and predicting mutations are critical for COVID-19 and similar pandemic preparedness. However, existing predictive models have yet to integrate the regularity and randomness of viral mutations with minimal data requirements. Here, we develop a non-demanding language model utilizing both regularity and randomness to predict candidate SARS-CoV-2 variants and mutations that might prevail. We constructed the "grammatical frameworks" of the available S1 sequences for dimension reduction and semantic representation to grasp the model's latent regularity. The mutational profile, defined as the frequency of mutations, was introduced into the model to incorporate randomness. With this model, we successfully identified and validated several variants with significantly enhanced viral infectivity and immune evasion by wet-lab experiments. By inputting the sequence data from three different time points, we detected circulating strains or vital mutations for XBB.1.16, EG.5, JN.1, and BA.2.86 strains before their emergence. In addition, our results also predicted the previously unknown variants that may cause future epidemics. With both the data validation and experiment evidence, our study represents a fast-responding, concise, and promising language model, potentially generalizable to other viral pathogens, to forecast viral evolution and detect crucial hot mutation spots, thus warning the emerging variants that might raise public health concern.
A predictive language model for SARS-CoV-2 evolution.
SARS-CoV-2 演化的预测语言模型
阅读:5
作者:Ma Enhao, Guo Xuan, Hu Mingda, Wang Penghua, Wang Xin, Wei Congwen, Cheng Gong
| 期刊: | Signal Transduction and Targeted Therapy | 影响因子: | 52.700 |
| 时间: | 2024 | 起止号: | 2024 Dec 23; 9(1):353 |
| doi: | 10.1038/s41392-024-02066-x | 研究方向: | 炎症/感染 |
| 疾病类型: | 新冠 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
