GlycanGT: a pretrained graph transformer framework for glycan graph representation and generative learning

GlycanGT:一个用于聚糖图表示和生成学习的预训练图转换器框架

阅读:1

Abstract

MOTIVATION: Glycans are highly diverse biological sequences, but their functional understanding has lagged behind proteins and nucleic acids. Many glycans remain ambiguously annotated, limiting computational analyses. Existing computational approaches are primarily graph-based, capturing local structural features but struggling to model global patterns and incomplete sequences. RESULTS: We present GlycanGT, a graph-transformer-based pretrained model for glycans. Glycans were represented as graphs of monosaccharides and glycosidic bonds, and the model was pretrained using a masked language modeling objective. GlycanGT demonstrated higher performance than existing methods across 8 benchmark tasks (e.g., 0.844 AUPRC for immunogenicity classification), and its embeddings formed biologically relevant clusters that recovered known N- and O-glycan categories. Moreover, GlycanGT accurately proposed candidates for ambiguous sequences, maintaining >80% top-5 accuracy for both monosaccharide and glycosidic bond predictions under high masking levels. AVAILABILITY AND IMPLEMENTATION: The source code used in this study is available at https://github.com/matsui-lab/GlycanGT and archived on Zenodo (DOI: 10.5281/zenodo.18636040); pretrained model weights are provided via Hugging Face (https://huggingface.co/Akikitani295/GlycanGT). CONTACT: matsui.yusuke.d4@f.mail.nagoya-u.ac.jp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。