Predicting host tropism in influenza a viruses: insights from multi-segment nucleotide signatures

预测甲型流感病毒的宿主嗜性:来自多片段核苷酸特征的启示

阅读:1

Abstract

BACKGROUND: Influenza A virus (IAV) poses a significant public health threat due to its cross-species transmission and complex host adaptation mechanisms. This study integrated whole-genome data from avian, human, swine, and bovine IAV strains, using machine learning to predict viral host tropism based on nucleotide site features and to identify key sites driving host adaptation along with their synergistic effects. METHODS: A total of 64,000 IAV sequences from avian, human, swine, and bovine hosts were analyzed to build host-prediction models. A four-class classification framework (avian, human, swine, bovine) was constructed using nucleotide site features from all eight genomic segments (PB2, PB1, PA, HA, NP, NA, MP, NS). Eight machine learning algorithms (logistic regression, decision tree, random forest, SVM, KNN, gradient boosting, XGBoost, LightGBM) were benchmarked via 10-fold stratified cross-validation. Model performance was evaluated using accuracy, precision, recall, F1-score, AUPRC, and AUC. SHAP (SHapley Additive exPlanations) analysis prioritized critical nucleotide sites, while bivariate association tests identified synergistic/antagonistic interactions between sites. Nucleotide composition profiles were compared across host groups using hierarchical clustering and heatmap visualization. RESULTS: The XGBoost algorithm demonstrated the best and most stable performance, achieving an AUC value of over 0.95 in distinguishing human-derived sequences from non-human ones. SHAP analysis identified the top 20 critical nucleotide sites for each gene segment, such as sites 46 and 698 in the NS segment. Nucleotide composition analysis revealed high similarity between human and swine sequences in the HA and PB2 segments, and between avian and bovine sequences. The HA segment was particularly challenging in differentiating human from swine strains. Bivariate site association analysis uncovered significant synergistic or antagonistic effects between key sites within gene segments, forming complex networks. For instance, in the NS segment, a positive prediction contribution was observed when sites 371, 698, and 419 were all G. CONCLUSIONS: This study advances our mechanistic understanding of IAV host adaptation, identifies molecular determinants for zoonotic risk stratification, and establishes a scalable machine learning framework for predicting viral host tropism through nucleotide signature analysis, thereby enhancing surveillance strategies and informing preventive measures against emerging viral threats.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。