Identification of human pathogens in soil by virulence gene-based machine learning method

利用基于毒力基因的机器学习方法鉴定土壤中的人类病原体

阅读:3

Abstract

Soils are important reservoirs of human pathogenic bacteria that can spread to humans through various pathways. Metagenomics enables high-throughput pathogen identification by mapping sequencing reads to known pathogen genomes. However, this approach has several limitations, e.g., sequence assembly is time-consuming, and reliance on reference databases may overlook potential pathogens lacking close genomic matches. Here, we developed a novel, virulence factor (VF) based machine learning method using the K-Nearest Neighbors model (VF-KNN) for identifying human pathogenic bacteria from soil metagenomes. Through learning the VF features of pathogenic and non-pathogenic bacteria, VF-KNN could achieve the desired performance in soil pathogen identification (AUC: 0.95, Accuracy: 0.85). Model prediction accuracy (0.95) was further validated using 61 pathogenic strains isolated from soil. For the top 15 most frequent soil pathogens, the prediction accuracy was >0.90 ​at 0.4X-1.0X genome coverage. VFs contributing significantly to pathogen identification were associated with regulation, effector delivery, motility, etc. By using VF-KNN, the averaged abundance of total potential pathogens in topsoils across China was 0.44% (n ​= ​336), predominantly concentrated in the eastern coastal provinces. Compared with the conventional method based on a predefined pathogen list, VF-KNN identified 28% more potential pathogenic species, including some newly reported but not in the predefined list (e.g., Mycolicibacterium cosmeticum). Agricultural land exhibited significantly higher pathogen abundance and diversity than the other land types. This newly developed VF-KNN method is applicable for pathogen identification in broader environments.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。