Abstract
Transposable elements (TEs) are DNA sequences that can move within a genome. They constitute a substantial portion of the eukaryotic genome and play essential roles in gene regulation and genome evolution. Accurate classification of these repetitive elements is crucial for investigating their potential impact on the genome. Over the past few decades, several alignment-based tools have been developed to annotate TE types. While these methods rely heavily on prior knowledge and are often computationally expensive, machine learning-based approaches have been proposed to overcome these limitations. However, most of these approaches fail to capture the multiscale features of TEs, resulting in suboptimal performance. Here, we propose a novel framework called CREATE, which simultaneously integrates the global pattern distribution and the local sequence profile of TEs using Convolutional neural networks and Recurrent neural nEtworks with an Attention mechanism for efficient TE classification. Due to the hierarchical structure of TE groups, we trained nine classifiers corresponding to parent nodes within the class hierarchy. We further applied a top-down hierarchical classification strategy to achieve a more complete classification of unknown TEs. Comprehensive experiments demonstrate that CREATE outperforms existing TE-type annotation methods and achieves superior performance in hierarchical classification tasks. In conclusion, CREATE exhibits great potential for improving the accuracy of TE annotation. The source code and demo data are available at https://github.com/yangqi-cs/CREATE.