T-shaped alignments integrating HIV-1 near full-length genome and partial pol sequences can improve phylogenetic inference of transmission clusters

整合 HIV-1 近全长基因组和部分 pol 序列的 T 形比对可以提高传播簇的系统发育推断。

阅读:3

Abstract

Molecular epidemiology and HIV-1 transmission networks reconstruction can provide insights into transmission dynamics and inform public health strategies. Long HIV sequences, such as near full-length (nFL) genomes, can improve the accuracy of phylogenetic inference. However, relatively short pol sequences are still broadly used for inferring molecular HIV clusters. Whether a mix of long and short HIV-1 sequences can improve phylogenetic inference of molecular HIV clusters remains unknown. We propose a flexible approach called T-shaped alignments that incorporates both nFL HIV-1 genomes and partial pol sequences, and investigate whether this approach improves phylogenetic reconstruction of molecular clusters. Under the assumption that clustering from 100% of long sequences is the most accurate, we obtained 1196 subtype B nFL HIV-1 sequences from the Los Alamos National Laboratory Database and a single-study subset, varied the proportion of long and short sequences in our T-shape alignments, systematically masked all non-pol regions with missing characters in proportional increments, and compared tree similarity and cluster inference among datasets. With the full dataset, we found that when more than 50% of available sequences are nFL, the T-shaped alignment gradually yields results closer to the 100% n, with more and larger clusters identified. However, below the 50% threshold accuracy did not increase. Stringent bootstrap thresholds decreased cluster accuracy gaps but also decreased number of clusters found and mean cluster size. For the subset dataset, we found that the introduction of nFL sequences to the T-shaped alignment improves accuracy in clustering either after a 30% threshold or immediately depending on bootstrap choice. Our new approach and results suggest that using T-shape alignments to mix HIV-1 sequences of different lengths can improve phylogenetic and clustering accuracy, with needed nFL proportion depending on analysis goals. The T-shape alignment provides a straightforward method for utilizing all available sequences to improve phylogenetic analysis.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。