A unified statistical framework for sequence comparison and structure comparison

用于序列比对和结构比对的统一统计框架

阅读:1

Abstract

We present an approach for assessing the significance of sequence and structure comparisons by using nearly identical statistical formalisms for both sequence and structure. Doing so involves an all-vs.-all comparison of protein domains [taken here from the Structural Classification of Proteins (scop) database] and then fitting a simple distribution function to the observed scores. By using this distribution, we can attach a statistical significance to each comparison score in the form of a P value, the probability that a better score would occur by chance. As expected, we find that the scores for sequence matching follow an extreme-value distribution. The agreement, moreover, between the P values that we derive from this distribution and those reported by standard programs (e.g., BLAST and FASTA validates our approach. Structure comparison scores also follow an extreme-value distribution when the statistics are expressed in terms of a structural alignment score (essentially the sum of reciprocated distances between aligned atoms minus gap penalties). We find that the traditional metric of structural similarity, the rms deviation in atom positions after fitting aligned atoms, follows a different distribution of scores and does not perform as well as the structural alignment score. Comparison of the sequence and structure statistics for pairs of proteins known to be related distantly shows that structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. The comparison also indicates that there are very few pairs with significant similarity in terms of sequence but not structure whereas many pairs have significant similarity in terms of structure but not sequence.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。