High-fidelity in silico generation and augmentation of TCR repertoire data using generative adversarial networks

利用生成对抗网络进行高保真度计算机模拟生成和增强TCR库数据

阅读:1

Abstract

Engineered T-cell receptor (eTCR) systems rely on accurately generated T-cell receptor (TCR) sequences to enhance immunotherapy predictability and efficacy. The most variable and crucial part of the TCR receptor is the CDR3 sequence region. Current methods for generating CDR3 sequences, including motif-based and Markov models, struggle to generate reliable, diverse, and novel TCR sequences. In this study, we present the first application of Generative Adversarial Networks (GANs) for producing biologically reliable CDR3 sequences, using Long Short-Term Memory (LSTM) networks and LeakyReLU-based GANs. Our results show that LSTM models generate more diverse sequences with higher accuracy, lower discriminator loss, and higher AUC compared to LeakyReLU. However, LeakyReLU provides greater stability with a lower generator loss, achieving a total Pearson correlation score of over 0.9. Both models demonstrate the ability to produce highly realistic TCR sequences, as validated by t-SNE clustering, frequency distribution analysis, TCRd3 BLAST analysis, and in silico docking. These findings highlight the potential of GANs as a powerful tool for generating synthetic yet biologically relevant TCR sequences, a crucial step toward improving eTCR-based therapies. Further refinement of amino acid frequency distributions and clinical validation will enhance their applicability for therapeutic purposes.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。