Abstract
In order to identify RNA pseudouridine more effectively, in this paper, we propose a new feature extraction method. First, the original sequence is converted into a numerical sequence based on two physicochemical properties of dinucleotides, namely free energy and hydrophilicity; then, it is subjected to discrete Fourier transform (DFT) and the amplitude of each DFT value is calculated. In this way, for an RNA sequence of length N, we can obtain 2(N-1) features. Ultimately, we utilize a convolutional neural network for prediction, incorporating a dynamic fully connected layer within it. The random search algorithm is employed to ascertain the optimal number of fully connected layers and to fine-tune the model parameters, thereby enabling adaptive regulation of model complexity and accommodating the varying needs of different species and datasets. Experimental results have shown that our model RSCNN-PseU has better identification effect for RNA pseudouridine.