Unraveling City-Specific Microbial Signatures and Identifying Sample Origins for the Data From CAMDA 2020 Metagenomic Geolocation Challenge

揭示城市特异性微生物特征并识别CAMDA 2020宏基因组地理定位挑战赛数据的样本来源

阅读:4

Abstract

The composition of microbial communities has been known to be location-specific. Investigating the microbial composition across different cities enables us to unravel city-specific microbial signatures and further predict the origin of unknown samples. As part of the CAMDA 2020 Metagenomic Geolocation Challenge, MetaSUB provided the whole genome shotgun (WGS) metagenomics data from samples across 28 cities along with non-microbial city data for 23 of these cities. In our solution to this challenge, we implemented feature selection, normalization, clustering and three methods of machine learning to classify the cities based on their microbial compositions. Of the three methods, multilayer perceptron obtained the best performance with an error rate of 19.60% based on whether the correct city received the highest or second highest number of votes for the test data contained in the main dataset. We then trained the model to predict the origins of samples from the mystery dataset by including these samples with the additional group label of "mystery." The mystery dataset compromised of samples collected from a subset of the cities in the main dataset as well as samples collected from new cities. For samples from cities that belonged to the main dataset, error rates ranged from 18.18 to 72.7%. For samples from new cities that did not belong to the main dataset, 57.7% of the test samples could be correctly labeled as "mystery" samples. Furthermore, we also predicted some of the non-microbial features for the mystery samples from the cities that did not belong to main dataset to draw inferences and narrow the range of the possible sample origins using a multi-output multilayer perceptron algorithm.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。