panHiTE: A comprehensive and accurate pipeline for TE detection in large-scale population genomes

panHiTE:一种用于大规模人群基因组中转座元件检测的全面而精确的流程

阅读:1

Abstract

Transposable elements (TEs) are key drivers of genomic variation and species evolution. Although advances in high-throughput sequencing have enabled population-scale identification of TE insertions, accurate detection across large and complex genomes remains challenging. Existing tools often struggle to efficiently process large genomes, recover low-copy elements, or accurately reconstruct full-length TEs, limiting comprehensive TE analyses. Here, we present panHiTE, a population-scale TE detection framework that introduces several methodological innovations. First, panHiTE employs a dynamically updated global TE library to avoid redundant detection of previously identified elements, improving computational efficiency and enabling application to extremely large genomes, such as the 15-Gb wheat genome. Second, to recover low-copy TEs that are frequently missed in individual genomes, panHiTE realigns candidate elements across population-scale genomes, enabling accurate reconstruction of full-length TEs across accessions. Third, because long terminal repeat retrotransposons constitute a major fraction of plant genomes, panHiTE integrates a deep-learning-based detection algorithm developed in this study, achieving higher sensitivity and precision than the state-of-the-art tool panEDTA in population-scale analyses. In addition, a fault-tolerant redundancy-removal algorithm efficiently groups divergent family members, generating TE libraries with more than 50% fewer sequences while doubling the number of Perfect TEs across 26 maize genomes. These advances enable panHiTE to deliver high-resolution TE annotations and accurately resolve TE-gene positional relationships, thereby facilitating the systematic identification of TE-induced differential expression loci (TIDELs). In 32 Arabidopsis accessions, panHiTE identifies 85 TIDELs associated with diverse biological functions and metabolic pathways. Overall, panHiTE provides a robust and scalable solution for population-scale TE discovery and functional characterization in complex plant genomes.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。