Abstract
Corrosion is the predominant failure mechanism in marine steel, and accurate corrosion prediction is essential for effective maintenance and protection strategies. However, the limited availability of corrosion datasets poses significant challenges to the accuracy and generalization of prediction models. This study introduces a novel integrated model designed for predicting marine corrosion under small sample sizes. The model utilizes dynamic marine environmental factors and material properties as inputs, with the corrosion rate as the output. Initially, a genetic algorithm (GA)-optimized machine learning framework is employed to derive the optimal GA-XGBoost model. To further enhance model performance, a virtual sample generation method combining Gaussian Mixture Model and Regression Generative Adversarial Network (GMM-RegGAN) is proposed. By incorporating these generated virtual samples into the base model, the prediction accuracy is further improved. The proposed framework is validated using corrosion datasets from six types of marine steel. Results demonstrate that GA optimization substantially improves both the performance and stability of the model. Virtual sample generation further enhances predictive performance, with reductions of 14.94% in RMSE, 15.55% in MAE, and 14.04% in MAPE. The results indicate that the proposed method offers a robust and effective framework for corrosion prediction in scenarios with limited sample data.