Abstract
MOTIVATION: Long non-coding RNAs (lncRNAs) have emerged as crucial players in diverse physiological and pathological processes, yet the biological mechanisms of the vast majority of lncRNAs remain elusive. To fill this gap, it is necessary to improve the accuracy of lncRNA identification and functional annotation. RESULTS: Here, we introduce LncADeep 2.0, an integrated deep learning framework designed to meet these needs. In the identification module, LncADeep 2.0 incorporated novel peptide features along with sequence and structural information, demonstrating superior performance over our previous LncADeep and other existing tools on both annotated transcripts from GENCODE and RNA-seq data. For functional annotation, LncADeep 2.0 leveraged lncRNA-centric interaction networks and gene ontology terms through the transfer learning strategy to achieve robust annotation performance with limited functional data. Compared to LncADeep, LncADeep 2.0 could accurately elucidate the general functions of given lncRNA sequences, predict tissue- or cell-type-specific functions from bulk and single-cell RNA-seq data, and establish connections between tumor-associated lncRNAs and genomic markers. Overall, LncADeep 2.0 stands out as an efficient and reliable tool for lncRNA identification and functional annotation across a wide spectrum of biological processes. AVAILABILITY AND IMPLEMENTATION: LncADeep 2.0 is available for use at https://github.com/Jefferson-Chou/LncADeep2 and https://doi.org/10.5281/zenodo.17164767.