nRCFV: a new, dataset-size-independent metric to quantify compositional heterogeneity in nucleotide and amino acid datasets

nRCFV:一种新的、与数据集大小无关的指标,用于量化核苷酸和氨基酸数据集中的组成异质性

阅读:1

Abstract

MOTIVATION: Compositional heterogeneity-when the proportions of nucleotides and amino acids are not broadly similar across the dataset-is a cause of a great number of phylogenetic artefacts. Whilst a variety of methods can identify it post-hoc, few metrics exist to quantify compositional heterogeneity prior to the computationally intensive task of phylogenetic tree reconstruction. Here we assess the efficacy of one such existing, widely used, metric: Relative Composition Frequency Variability (RCFV), using both real and simulated data. RESULTS: Our results show that RCFV can be biased by sequence length, the number of taxa, and the number of possible character states within the dataset. However, we also find that missing data does not appear to have an appreciable effect on RCFV. We discuss the theory behind this, the consequences of this for the future of the usage of the RCFV value and propose a new metric, nRCFV, which accounts for these biases. Alongside this, we present a new software that calculates both RCFV and nRCFV, called nRCFV_Reader. AVAILABILITY AND IMPLEMENTATION: nRCFV has been implemented in RCFV_Reader, available at: https://github.com/JFFleming/RCFV_Reader . Both our simulation and real data are available at Datadryad: https://doi.org/10.5061/dryad.wpzgmsbpn .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。