Robustness of Ancestral Sequence Reconstruction to Among-site and Among-lineage Evolutionary Heterogeneity

祖先序列重建对位点间和谱系间进化异质性的稳健性

阅读:1

Abstract

Ancestral sequence reconstruction is typically performed using homogeneous evolutionary models, which assume that the same substitution propensities affect all sites and lineages. These assumptions are routinely violated: heterogeneous structural and functional constraints favor different amino acids at different sites, and these constraints often change among lineages as epistatic substitutions accrue at other sites. To evaluate how violations of the homogeneity assumption affect ancestral sequence reconstruction under realistic conditions, we developed site-specific substitution models and parameterized them using data from deep mutational scanning experiments on three protein families; we then used these models to perform ancestral sequence reconstruction on the empirical alignments and on alignments simulated under heterogeneous conditions derived from the experiments. Extensive among-site and -lineage heterogeneity is present in these datasets, but the sequences reconstructed from empirical alignments are almost identical when heterogeneous or homogeneous models are used for ancestral sequence reconstruction. Using models fit to deep mutational scanning data from distantly related proteins in which mutational effects are very different also has a minimal impact on ancestral sequence reconstruction. The rare differences occur primarily where phylogenetic signal is weak-at fast-evolving sites and nodes connected by long branches. When ancestral sequence reconstruction is performed on simulated data, errors in the reconstructed sequences become more likely as branch lengths increase, but incorporating heterogeneity into the model does not improve accuracy. These data establish that ancestral sequence reconstruction is robust to unincorporated realistic forms of evolutionary heterogeneity, because the primary determinant of ancestral sequence reconstruction is phylogenetic signal, not the substitution model. The best way to improve accuracy is therefore not to develop more elaborate models but to apply ancestral sequence reconstruction to densely sampled alignments that maximize phylogenetic signal at the nodes of interest.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。