Selecting the Right Similarity-Scoring Matrix

选择合适的相似度评分矩阵

阅读:1

Abstract

Protein sequence similarity searching programs like BLASTP, SSEARCH (UNIT 3.10), and FASTA use scoring matrices that are designed to identify distant evolutionary relationships (BLOSUM62 for BLAST, BLOSUM50 for SEARCH and FASTA). Different similarity scoring matrices are most effective at different evolutionary distances. "Deep" scoring matrices like BLOSUM62 and BLOSUM50 target alignments with 20 - 30% identity, while "shallow" scoring matrices (e.g. VTML10 - VTML80), target alignments that share 90 - 50% identity, reflecting much less evolutionary change. While "deep" matrices provide very sensitive similarity searches, they also require longer sequence alignments and can sometimes produce alignment overextension into non-homologous regions. Shallower scoring matrices are more effective when searching for short protein domains, or when the goal is to limit the scope of the search to sequences that are likely to be orthologous between recently diverged organisms. Likewise, in DNA searches, the match and mismatch parameters set evolutionary look-back times and domain boundaries. In this unit, we will discuss the theoretical foundations that drive practical choices of protein and DNA similarity scoring matrices and gap penalties. Deep scoring matrices (BLOSUM62 and BLOSUM50) should be used for sensitive searches with full-length protein sequences, but short domains or restricted evolutionary look-back require shallower scoring matrices.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。