Graph latent diffusion-based molecular representation learning for enhanced generalization in molecular property prediction

基于图潜在扩散的分子表征学习,可增强分子性质预测的泛化能力

阅读:1

Abstract

This study aims to evaluate the effect of latent diffusion models on molecular representation learning from the perspective of generalization performance in molecular property prediction. To this end, we formulate a deep generative model for molecular representation learning based on a latent diffusion-based prior distribution, and introduce an evaluation methodology of generalization for learned molecular representations using the widely applicable information criterion (WAIC) and the widely applicable Bayesian information criterion (WBIC). Furthermore, we propose an analysis framework based on smoothness and multi-modality to analyze the factor of generalization in molecular representations. We constructed the graph latent diffusion autoencoder (Graph LDA), a deep molecular generative model that combines a transformer-based graph variational autoencoder and latent-diffusion-based latent prior distribution, designed to construct graph-level molecular representations through unsupervised learning. We compared the generalization performance of Graph LDA with other molecular representation learning models using WBIC and WAIC across multiple molecular properties, including HOMO energy, solubility, and biological activities. The results demonstrate that molecular representations learned by different models exhibit distinct generalization behaviors, and that representations learned by Graph LDA-using a latent diffusion-based prior-consistently show improved generalization in molecular property prediction. Using our proposed framework, we empirically demonstrate that the superior generalization performance of Graph LDA is attributable to the smoothness and multimodality of its learned molecular latent representation. These findings provide a principled understanding of the role of latent diffusion-based molecular representation learning in improving generalization performance. Scientific contribution: This work systematically analyzed the effect of latent diffusion-based priors in molecular representation learning from the perspective of generalization performance in molecular property prediction. Through generalization evaluation using WBIC and WAIC, together with an analysis framework for molecular representations, it is empirically demonstrated that latent diffusion-based priors contribute to deep generative models extracting smooth and multimodal latent representations, which in turn lead to enhanced generalization performance of molecular representations. These findings offer a principled guideline for developing molecular representation learning models with high generalization.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。