Do Pseudosequences Matter in Neoantigen Prediction?

假序列在新抗原预测中重要吗?

阅读:1

Abstract

Computational prediction of neoantigens that elicit T cell responses is central to the development of personalized cancer vaccines. Many current predictors represent MHC class I alleles using selected subsets of residues, known as pseudosequences, yet the extent to which pseudosequence choice and encoding strategy influence predictive performance has not been systematically examined. This study addresses that gap by evaluating a range of MHC representations within the BigMHC EL framework. We compared pseudosequence definitions based on protein structure and evolutionary diversity, a randomly sampled pseudosequence baseline, pseudosequences of varying lengths, embeddings generated using the ESM-2 protein language model, and a graph-based annotation embedding derived from allele groupings. Models using biologically informed pseudosequences consistently outperformed the random baseline, underscoring the importance of residue selection. Protein structure and evolutionary diversity pseudosequences showed similar performance, likely reflecting overlap in residues near the peptide-binding groove. We also found that pseudosequences of approximately 30 to 35 residues produced the strongest performance. Lastly, ESM-2 and annotation-based embeddings outperformed the random baseline but did not surpass curated pseudosequences under the current setup. Together, these findings indicate that curated pseudosequences remain efficient representations of MHC alleles in neoantigen prediction models, while alternative encodings can approximate but not yet replace residue-level sequence information.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。