Extended similarity indices (i.e., generalization of pairwise similarity) have recently gained importance because of their simplicity, fast computation and superiority in tasks like diversity picking. However, they operate with several meta parameters that should be optimized. Earlier, we extended the binary similarity indices to 'discrete non-binary' and 'continuous' data; now we continue with introducing and comparing multiple weighting functions. As a case study, the similarity of CYP enzyme inhibitors (4016 molecules after curation) was characterized by their extended similarities, based on 2D descriptors, MACCS and Morgan fingerprints. A statistical workflow based on sum of ranking differences (SRD) and analysis of variance (ANOVA) was used for finding the optimal weight function(s). Overall, the best weighting function is the fraction ("frac"), which corresponds to the principle of parsimony. Optimal extended similarity indices were also found, and their differences are revealed across different data sets. We intend this work to be a guideline for users of extended similarity indices regarding the various weighting options available. Source code for the calculations is available at https://github.com/mqcomplab/MultipleComparisons.
Alternative weighting schemes for fine-tuned extended similarity indices.
阅读:7
作者:López Pérez Kenneth, Rácz Anita, Bajusz Dávid, Gonzalez Camila, Héberger Károly, Miranda-Quintana Ramón Alain
| 期刊: | Journal of Chemometrics | 影响因子: | 2.100 |
| 时间: | 2024 | 起止号: | 2024 Sep |
| doi: | 10.1002/cem.3558 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
