Representative Random Sampling of Chemical Space

化学空间的代表性随机抽样

阅读:1

Abstract

An overwhelming majority of molecules remain unexplored. This is mostly due to the sheer number of them, which prohibits any enumeration of chemical space, the set of all such molecules. In practice, only subsets of chemical space are considered, but those subsets exhibit substantial bias, prohibiting the data-driven characterization of chemical space itself. In this work, we provide a method to produce unbiased representative random samples of the chemical space without enumeration of constituent molecules and to estimate the number of molecules in any custom chemical space. The approach is applicable to molecules that can be represented as graphs and runs efficiently even for molecules of 30 atoms. We use it to estimate the representativeness of current databases with respect to their underlying chemical space and establish a necessary criterion for a lower bound of database sizes to be representative of an underlying chemical space.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。