A Dataset of Raman and Infrared Spectra as an Extension to the ChEMBL

作为ChEMBL数据库的扩展,我们构建了一个包含拉曼光谱和红外光谱的数据集。

阅读:4

Abstract

Raman spectroscopy and Infrared (IR) spectroscopy are two important tools in solving the structure and bond properties of molecules. With the development of deep learning methods in material science, there is a growing demand for the quantity and diversity of quantum chemistry data, so as the spectral information. However, plenty of spectra still missing in current datasets. To solve this problem, we applied Gaussian09 to construct a Raman spectrum and IR spectral dataset. In this work, currently a total of 220,000 molecules were extracted from ChEMBL. The number of molecules is increasing and is uploaded regularly. The dataset comprises optimized geometries, vibrational frequencies, IR and Raman intensities, and energies expanding both the breadth and depth of existing quantum chemistry collections. By providing high-fidelity, multidimensional feature sets, this resource enables the training and benchmarking of next-generation models including inferring substructures from spectroscopic fingerprints, assembling molecule structure from spectras, and prediction Raman or IR spectra for novel molecules.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。