This study aims to enhance the accuracy of predicting transposon-derived piRNAs through the development of a novel computational method namely TranspoPred. TranspoPred leverages positional, frequency, and moments-based features extracted from RNA sequences. By integrating multiple deep learning networks, the objective is to create a robust tool for forecasting transposon-derived piRNAs, thereby contributing to a deeper understanding of their biological functions and regulatory mechanisms. Piwi-interacting RNAs (piRNAs) are currently considered the most diverse and abundant class of small, non-coding RNA molecules. Such accurate instrumentation of transposon-associated piRNA tags can considerably involve the study of small ncRNAs and support the understanding of the gametogenesis process. First, a number of moments were adopted for the conversion of the primary sequences into feature vectors. Bagging, boosting, and stacking based ensemble classification approaches were employed during the study. Classifiers such as Random Forest (RF), Extra Trees (ET), and Decision Tree were utilized in the Bagging approach. The Boosting approach involved the use of XGBoost (XGB), AdaBoost, and Gradient Boost. For the Stacking method, base learners such as k-Nearest Neighbor (KNN), Support Vector Machine (SVM), Artificial Neural Network (ANN), and Decision Trees were employed, with a Neural Network (NN) serving as the meta-learner. The computational models underwent rigorous evaluation through 2 Ã 5-fold cross-validation, 10-fold cross-validation, and independent testing across datasets from three species: human, mouse, and Drosophila. The evaluation metrics used were Accuracy (ACC), Specificity (SP), Sensitivity (SN), and Matthew's Correlation Coefficient (MCC) along with F-1 measure. The ensemble methods consistently outperformed others in almost all testing scenarios. Notably, stacking achieved perfect scores for accuracy, specificity, sensitivity, and MCC in independent set testing for human and Drosophila datasets, and nearly perfect scores for the mouse dataset. Use of independent set testing accross species evaluates the generalizability and adaptability of the model for diverse data samples. The proposed method TranspoRed achieved exquisite results on diverse datasets for humans, mouse and Drosophila. Our methods exhibited superior performance compared to other state-of-the-art techniques for predicting transposon-derived piRNA. The proposed approaches show great potential for enhancing the accuracy of piRNA prediction, significantly aiding future research and the scientific community in the in-silico identification of piRNA. The source codes and datasets utilized in this study are accessible at https://github.com/MansoorAhmadRasheed/piRNA-codes-and-result .
An ensemble strategy for piRNA identification through hybrid moment-based feature modeling.
阅读:4
作者:Rasheed Mansoor Ahmed, Alkhalifah Tamim, Alturise Fahad, Khan Yaser Daanial
| 期刊: | Scientific Reports | 影响因子: | 3.900 |
| 时间: | 2025 | 起止号: | 2025 Aug 18; 15(1):30157 |
| doi: | 10.1038/s41598-025-14194-7 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
