Phylogenetic tree-based amino acid sequence generation for proteomics data analysis of unknown species

基于系统发育树的氨基酸序列生成方法用于未知物种的蛋白质组学数据分析

阅读:1

Abstract

In bottom-up proteomics, selecting an appropriate protein amino acid sequence database is vital for reliable peptide identification. However, this approach excludes species with unsequenced genomes, limiting the comprehensiveness. This is a major challenge in current microbiota proteomics, a rapidly developing field, which involves simultaneously assigning proteins to species in a sample and analyzing them using databases of protein amino acid sequences with known genomes. We aimed to develop a method to extend the database species diversity by generating protein amino acid sequences of unknown species using phylogenetic relationships among known species. To evaluate this approach, we generated the Helicobacter pylori F16 strain sequence based on the phylogenetic relationships of 29 closely related strains (excluding F16). Consequently, the percentages of peptides that matched the peptides obtained from the reference F16 strain increased by 5 %, based on sequence generation. Proteomics data analyses were performed on the F16 strain using the generated sequence database to validate peptide identification. Peptide spectral match decreased when the database was expanded using sequence generation owing to a decrease in sensitivity primarily caused by an increase in decoy hits. The decrease in identification sensitivity caused by large-scale databases could be improved by introducing a novel score, Ion Cover Score, based on spectral matching. The sequence generation method used in the present study and the introduction of scores based on spectral matching could accelerate proteomics development.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。