Abstract
With the development of sequencing technology, it has been confirmed that many noncoding RNAs (ncRNAs) can participate in various biological processes. Although significant progress has been made in the classification of ncRNA families, numerous challenges remain. On the one hand, accurately characterizing the biological properties of ncRNAs presents a technical challenge, which includes not only information about the sequence itself but also structural information of ncRNAs. On the other hand, different types of ncRNAs exhibit significant differences in sequence length, structure, and function. Accurate classification by extracting valid information from sequences remains a challenging problem. To address these issues, we propose a novel 3D graphical representation method based on the Z-curve and chaos game representation of RNA secondary structure to mine the underlying information of bases. This 3D graphical representation method converts RNA secondary structures into a set of 3D graphical representation of sequences under different base classifications. Furthermore, we have verified the effectiveness of this 3D graphical representation method on viral sequence datasets. Subsequently, we propose an ncRNA family classification model called nRMFCA, based on multifeature fusion and convolutional block attention residual networks. In this study, nRMFCA was compared with previously proposed ncRNA family prediction methods using two commonly used public ncRNA datasets, NCY and nRC. The results demonstrate that nRMFCA outperforms other prediction methods on both datasets. Overall, by integrating a novel 3D graphical representation method with a multifeature fusion-based convolutional block attention residual network, the nRMFCA model achieves better classification of ncRNA families, providing a powerful tool for in-depth research on ncRNAs.