The Relevance of Goodness-of-fit, Robustness and Prediction Validation Categories of OECD-QSAR Principles with Respect to Sample Size and Model Type

OECD-QSAR 原则中拟合优度、稳健性和预测验证类别与样本量和模型类型的相关性

阅读:1

Abstract

We investigated the relevance of the validation principles on the Quantitative Structure Activity Relationship models issued by Organization for Economic and Co-operation and Development. We checked the goodness-of-fit, robustness and predictivity categories in linear and nonlinear models using benchmark datasets. Most of our conclusions are drawn using the sample size dependence of the different validation parameters. We found that the goodness-of-fit parameters misleadingly overestimate the models on small samples. In the case of neural network and support vector models, the feasibility of the goodness-of-fit parameters often might be questioned. We propose to use the simplest y-scrambling method to estimate chance correlation. We found that the leave-one-out and leave-many-out cross-validation parameters can be rescaled to each other in all models and the computationally feasible method should be chosen depending on the model type. We assessed the interdependence of the validation parameters by calculating their rank correlations. Goodness of fit and robustness correlate quite well over a sample size for linear models and one of the approaches might be redundant. In the rank correlation between internal and external validation parameters, we found that the assignment of good and bad modellable data to the training or the test causes negative correlations.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。