Singular value decomposition of protein sequences as a method to visualize sequence and residue space

蛋白质序列的奇异值分解作为一种可视化序列和残基空间的方法

阅读:1

Abstract

Singular value decomposition (SVD) of multiple sequence alignments (MSAs) is an important and rigorous method to identify subgroups of sequences within the MSA, and to extract consensus and covariance sequence features that define the alignment and distinguish the subgroups. This information can be correlated to structure, function, stability, and taxonomy. However, the mathematics of SVD is unfamiliar to many in the field of protein science. Here, we attempt to present an intuitive yet comprehensive description of SVD analysis of MSAs. We begin by describing the underlying mathematics of SVD in a way that is both rigorous and accessible. Next, we use SVD to analyze sequences generated with a simplified model in which the extent of sequence conservation and covariance between different positions is controlled, to show how conservation and covariance produce features in the decomposed coordinate system. We then use SVD to analyze alignments of two protein families, the homeodomain and the Ras superfamilies. Both families show clear evidence of sequence clustering when projected into singular value space. We use k-means clustering to group MSA sequences into specific clusters, show how the residues that distinguish these clusters can be identified, and show how these clusters can be related to taxonomy and function. We end by providing a description a set of Python scripts that can be used for SVD analysis of MSAs, displaying results, and identifying and analyzing sequence clusters. These scripts are freely available on GitHub.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。