Data-driven frameworks to robustly predict solubility parameter of diverse polymers

基于数据驱动的框架,可稳健地预测多种聚合物的溶解度参数

阅读:1

Abstract

This study intends to effectively forecast solubility parameter of diverse polymers by creating machine learning models that can grasp the complex relationships between essential input factors like molecular weight, melting point, boiling point, liquid molar volume, radius of gyration, dielectric constant, dipole moment, refractive index, van der Waals area and reduced volume, and parachor, alongside the target variable, which is solubility coefficient of polymers. The goal is to create strong models that accurately capture these intricate relationships to facilitate accurate forecasts of the solubility parameter for polymers. Multiple machine learning algorithms, ranging from basic methods like Linear Regression to advanced techniques such as Artificial Neural Networks (ANNs), Ridge Regression, Lasso Regression, Support Vector Machines (SVMs), Linear Regression, Random Forests (RFs), Gradient Boosting Machines (GBM), K-Nearest Neighbors (KNN), Elastic Net, Decision Trees, Light Gradient Boosting Machine (LightGBM), Categorical Boosting (CatBoost), Convolutional Neural Networks (CNNs), and Extreme Gradient Boosting (XGBoost) were utilized. These methods were utilized to create data-driven models that adeptly seize the intricate connections between input characteristics and output variable, facilitating precise predictions of the solubility parameter for polymers. The efficacy of the developed models was rigorously evaluated using statistical metrics such as R², RMSE, and MRD%, along with visual tools including cross-plots, deviation plots, and SHAP analysis to enhance interpretability and predictive reliability. To guarantee the dataset's reliability, consisting of 1,799 datapoints on the solubility parameter of polymers, the Monte Carlo outlier detection algorithm was utilized. This stage verified the dataset's accuracy and appropriateness for model training and evaluation. Results indicated that the models CatBoost, ANN, and CNN surpassed other techniques, attaining superior accuracy shown by the highest R-squared values and the lowest error rates. Sensitivity analysis showed that every input feature impacted the target variable, while SHAP analysis determined that dielectric constant was the most significant factor influencing the solubility parameter of polymers. These results highlight the efficiency of the utilized machine learning methods and emphasize the vital importance of these input parameters in establishing the solubility parameter of polymers. This method not only verifies that the models can make accurate predictions but also provides valuable insights into the impact of input features on solubility parameters of polymers, enhancing algorithm interpretability and scientific understanding.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。