The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion

根据比对准确率标准,对各种类型的氨基酸替换矩阵进行范围界定

阅读:1

Abstract

BACKGROUND: The alignment of character sequences is important in bioinformatics. The quality of this procedure is determined by the substitution matrix and parameters of the insertion-deletion penalty function. These matrices are derived from sequence alignment and thus reflect the evolutionary process. Currently, in addition to evolutionary matrices, a large number of different background matrices have been obtained. To make an optimal choice of the substitution matrix and the penalty parameters, we conducted a numerical experiment using a representative sample of existing matrices of various types and origins. RESULTS: We tested both the classical evolutionary matrix series (PAM, Blosum, VTML, Pfasum); structural alignment based matrices, contact energy matrix, and matrix based on the properties of the genetic code. This study presents results for two test set types: first, we simulated sequences that reflect the divergent evolution; second, we performed tests on Balibase sequences. In both cases, we obtained the dependences of the alignment quality (Accuracy, Confidence) on the evolutionary distance between sequences and the evolutionary distance to which the substitution matrices correspond. Optimization of a combination of matrices and the penalty parameters was carried out for local and global alignment on the values of penalty function parameters. Consequently, we found that the best alignment quality is achieved with matrices corresponding to the largest evolutionary distance. These matrices prove to be universal, i.e. suitable for aligning sequences separated by both large and small evolutionary distances. We analysed the correspondence of the correlation coefficients of matrices to the alignment quality. It was found that matrices showing high quality alignment have an above average correlation value, but the converse is not true. CONCLUSIONS: This study showed that the best alignment quality is achieved with evolutionary matrices designed for long distances: Gonnet, VTML250, PAM250, MIQS, and Pfasum050. The same property is inherent in matrices not only of evolutionary origin, but also of another background corresponding to a large evolutionary distance. Therefore, matrices based on structural data show alignment quality close enough to its value for evolutionary matrices. This agrees with the idea that the spatial structure is more conservative than the protein sequence.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。