M3Site: multiclass multimodal learning for protein active site identification and classification

M3Site:用于蛋白质活性位点识别和分类的多类多模态学习

阅读:3

Abstract

Accurately identifying and classifying protein active sites is crucial for understanding protein mechanisms, drug design, and synthetic biology. Current methods often rely on binary classification and single-modal data, limiting their scope. To address these limitations, we propose M$^{3}$Site, a multimodal framework that integrates protein sequence embeddings, structural graph representations, and functional text annotations for residue-level, multiclass active site prediction. Built upon a curated dataset of 25 883 proteins sourced from UniProt and AlphaFold2, M$^{3}$Site leverages pretrained protein language models, equivariant graph neural networks, and biomedical language models for feature extraction. The function informed cross-attention module enables cross-modal feature fusion, while the adaptive weighted fusion mechanism balances modality contributions. A compound loss function tackles class imbalance, ensuring robust performance. Experimental results show M$^{3}$Site significantly outperforms existing models, and an interactive application has been developed to enhance its practical utility for predictions and visualizations. The dataset, source code for experiments, and interactive application are publicly available at https://github.com/Gift-OYS/M3Site.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。