Molecular interactions underlie nearly all biological processes, but most machine learning models treat molecules in isolation or specialize in a single type of interaction, such as protein-ligand or protein-protein binding. Here, we introduce ATOMICA, a geometric deep learning model that learns atomic-scale representations of intermolecular interfaces across five modalities, including proteins, small molecules, metal ions, lipids, and nucleic acids. ATOMICA is trained on 2,037,972 interaction complexes using self-supervised denoising and masking to generate embeddings of interaction interfaces at the levels of atoms, chemical blocks, and molecular interfaces. ATOMICA's latent space is compositional and captures physicochemical features shared across molecular classes, enabling representations of new molecular interactions to be generated by algebraically combining embeddings of interaction interfaces. The representation quality of this space improves with increased data volume and modality diversity. As in pre-trained natural language models, this scaling law implies predictable gains in performance as structural datasets expand. We construct modality-specific interfaceome networks, termed ATOMICANets, which connect proteins based on interaction similarity with ions, small molecules, nucleic acids, lipids, and proteins. By overlaying disease-associated proteins of 27 diseases onto ATOMICANets, we find strong associations for asthma in lipid networks and myeloid leukemia in ion networks. We use ATOMICA to annotate the dark proteome-proteins lacking known function-by predicting 2,646 uncharacterized ligand-binding sites, including putative zinc finger motifs and transmembrane cytochrome subunits. We experimentally confirm heme binding for five ATOMICA predictions in the dark proteome. By modeling molecular interactions, ATOMICA opens new avenues for understanding and annotating molecular function at scale.
Learning Universal Representations of Intermolecular Interactions with ATOMICA.
阅读:3
作者:Fang Ada, Desgagné Michael, Zhang Zaixi, Zhou Andrew, Loscalzo Joseph, Pentelute Bradley L, Zitnik Marinka
| 期刊: | bioRxiv | 影响因子: | 0.000 |
| 时间: | 2025 | 起止号: | 2025 Jul 15 |
| doi: | 10.1101/2025.04.02.646906 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
