Abstract
RNA 5-hydroxymethylcytosine (5hmC) post-transcriptional modification plays an important role in the regulation of gene expression, its stability and evolution. These kinds of modifications emphasizes to understand their further biological impacts. Although several sequencing technologies supports in biochemical experiments for the precise determination of 5hmC modifications, these experimental identifications are laborious and costly; therefore, it is obvious to prioritise some proper and effective computational support. In this work, we introduce hm5C-DeepPred, which is a deep learning-based model that incorporates composite features including positional, relational and configurational properties of RNA sequences using moment based characteristic features. The model is predicated on a highly-tuned convolution neural network as its core predictor, complemented by fine-tuned transformer models as stronger comparative baselines. The generalizability and robustness of hm5C-DeepPred were verified by 10-fold cross-validation, statistical significance test, internal independent validation set and cross-species external testing. Our results showed that hm5C-DeepPred outperformed other predictors on those benchmarks, having the accuracy of 93.34% and MCC of 0.8942, respectively. We also used explainable AI techniques to interpret feature contributions and pinpoint the sequence elements that contributed most to model decisions. Hence, hm5C-DeepPred presents an effective and reliable computational tool that has potential to be highly reproducible for large-scale analysis of RNA 5hmC modifications. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s13040-025-00517-x.