Exploring the Potential of Adaptive, Local Machine Learning in Comparison to the Prediction Performance of Global Models: A Case Study from Bayer's Caco-2 Permeability Database

探索自适应局部机器学习的潜力,并将其与全局模型的预测性能进行比较:以拜耳 Caco-2 渗透性数据库为例

阅读:1

Abstract

Machine learning (ML) techniques are being widely implemented to fill the gap in simple molecular design guidelines for newer therapeutic modalities in the extended and beyond rule of five chemical space (eRo5, bRo5). These ML techniques predict molecular properties directly from the structure, allowing for the prioritization of promising compounds. However, the performance of models varies greatly among ML use cases. A molecular property for which achieving sufficient performance in generalizing global models still remains difficult is Caco-2 permeability. Especially within the lower permeability ranges, which are specific for larger molecules belonging to the e/bRo5 space, accurate regression predictions have proven to be challenging. The present study, therefore, identifies a suitable combination of ML algorithm and descriptors, consisting of the LightGBM algorithm and RDKit molecular property descriptors, to predict Caco-2 permeability very efficiently by a simple global model. An additionally introduced local model uses the same algorithm and descriptors but selects its training data based on Tanimoto fingerprint similarity to match the individual test compound's structure. Evaluation of this adaptive model, by systematically varying the number of most similar structures for training, shows that, in comparison to the global model, there was only marginally improved performance with specific training data constellations. These random improvements indicate that deriving general rules for local model parametrization is not possible a priori for the chosen algorithm and descriptor combination, and preselecting training data does not seem advantageous over global ML based on all available data, while creation of more data-efficient models was generally proven to be possible.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。