Abstract
We encounter issues with the huge number of useful features and the uneven target classes in the dataset when attempting to detect corruption reports in texts. We offer a novel approach that uses deep reinforcement learning techniques to identify corruption reports in texts in order to address these issues. Our suggested approach is broken down into four primary phases and integrates deep reinforcement learning, feature selection, and feature description techniques. In order to prepare the texts for the following steps, the first step is devoted to data preparation activities. The second step involves the feature extraction process, which is carried out employing three feature types: statistical features, which define the text's attributes in terms of frequency and statistics; Term Frequency-Inverse Document Frequency (TF-IDF) features, which assign weights to terms based on their occurrence in all texts and throughout the dataset; and Word2Vec characteristics, which not only describe the importance of features but also model the concurrency and communication traits of phrases. Following the combination of these three feature sets, an ideal subset is chosen and the dataset's dimensionality is decreased using the Singular Value Decomposition method. In the fourth and last stage, a Convolutional Neural Network (CNN) is utilized to carry out the detection process. The CNN model's configuration is modified using the Q-learning model. Experiments on identifying corruption reports in texts have revealed that our suggested approach has an average accuracy of 90.04% and F-measure of 0.9. These findings demonstrate the method's superior performance over other approaches already in use and validate its capacity to identify positive samples in texts pertaining to corruption with more accuracy.