Abstract
In the context of worsening global water scarcity and prominent water pollution issues, precise water quality assessment and prediction are crucial for water resource management. Addressing the shortcomings of traditional methods in multi-parameter collaborative analysis, class imbalance handling, and nonlinear fitting, this study focuses on Heilongjiang Province, China, to construct a multi-parameter watershed water quality level prediction system comprising "data preprocessing-feature dimensionality reduction-Model Optimization" multi-parameter watershed water quality level prediction system, combining Principal Component Analysis (PCA), C4.5 decision tree, Backpropagation (BP) neural network, Convolutional Neural Network (CNN), and Long Short-Term Memory (LSTM) networks for the study. The study collected 31,289 samples from 23 monitoring stations in the region between May and October 2023. After balancing the data using SMOTE oversampling and retaining information through PCA dimension reduction, the data were input into the four models. The results showed that PCA-C4.5 achieved an total accuracy of 87.26% but was weak in identifying small samples, while PCA-BP achieved the best total accuracy of 94.52%. PCA-CNN (93.27%) was good at capturing local features, and PCA-LSTM (93.42%) was good at capturing temporal patterns. This study confirms the feasibility of comparing and optimizing multiple algorithms (including C4.5 decision tree, BP neural network, CNN, and LSTM combined with PCA dimensionality reduction), providing a new method for watershed monitoring. Future research could incorporate attention mechanisms or multi-source data to enhance model performance.