GAMMA: gap-aware motif mining under incomplete labeling with applications to MHC motifs

GAMMA:基于间隙感知的不完全标记基序挖掘及其在MHC基序中的应用

阅读:1

Abstract

MOTIVATION: Sequence motif identification is crucial for understanding molecular recognition, particularly in immune responses involving peptide binding to major histocompatibility complex (MHC) Class I molecules for antigen presentation to T cells. Traditionally, MHC Class I binding motifs are assumed to be contiguous and span nine amino acids. However, structural evidence suggests that binding may involve nonadjacent residues, challenging the assumptions of existing methods. RESULTS: In this study, we propose Gap-Aware Motif Mining Algorithm (GAMMA), a probabilistic framework designed to identify noncontiguous motifs under conditions of incomplete labeling. GAMMA employs Bayesian inference with Markov chain Monte Carlo sampling to jointly estimate motif parameters, binding locations, and the relative spacing between binding positions. Through extensive simulations and real-world applications to MHC Class I peptide datasets, GAMMA outperforms existing motif discovery tools such as GLAM2 in accurately localizing binding residues and identifying the underlying motifs. Notably, our results suggest that the true number of binding residues may be eight, fewer than the commonly assumed nine. In addition, for longer peptides, the model captures increased flexibility in the central region, consistent with structural observations that peptides may bulge in the middle. AVAILABILITY AND IMPLEMENTATION: The raw data and the source codes are available on GitHub (https://github.com/RanLIUaca/GAMMAmotif).

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。