Abstract
The heterogeneous viscosity distribution of biodegraded heavy oil poses significant challenges for reservoir management. While molecular markers (biomarkers) reflect biodegradation intensity, existing models fail to establish quantitative correlations between biomarker signatures and viscosity due to multicollinearity in high-dimensional geochemical data. This study develops an integrated machine learning framework to decode biomarker-viscosity relationships in the Songliao Basin heavy oils. Our dual-phase methodology combines ridge regression for multicollinearity mitigation with a feedforward neural network (FFNN) to capture nonlinear interactions. Key biomarkers were identified through geochemical analysis of 17 heavy oil samples spanning PM0-PM6 biodegradation levels. The hybrid model achieved exceptional prediction accuracy (R(2) = 0.99996, RMSE = 3.39) through L2-regularized feature selection and neural network optimization, outperforming standalone FFNN models (cross-validation R(2) improvement from 0.032 to 0.99996). Reverse prediction experiments validated biomarker response patterns, even in severely biodegraded oils. The advanced machine learning model proposed in this study is applicable to predict the viscosity of heavy oil and its biomarkers, thereby improving reservoir management strategies. Additionally, this study contributes a new perspective on characterizing and managing the reservoirs of various geological backgrounds and origins, not just biodegradable heavy oil reservoirs.