Simulation of adaptive immune receptors and repertoires with complex immune information to guide the development and benchmarking of AIRR machine learning

利用复杂的免疫信息模拟适应性免疫受体和受体库,以指导AIRR机器学习的开发和基准测试

阅读:1

Abstract

Machine learning (ML) has shown great potential in the adaptive immune receptor repertoire (AIRR) field. However, there is a lack of large-scale ground-truth experimental AIRR data suitable for AIRR-ML-based disease diagnostics and therapeutics discovery. Simulated ground-truth AIRR data are required to complement the development and benchmarking of robust and interpretable AIRR-ML methods where experimental data is currently inaccessible or insufficient. The challenge for simulated data to be useful is incorporating key features observed in experimental repertoires. These features, such as antigen or disease-associated immune information, cause AIRR-ML problems to be challenging. Here, we introduce LIgO, a software suite, which simulates AIRR data for the development and benchmarking of AIRR-ML methods. LIgO incorporates different types of immune information both on the receptor and the repertoire level and preserves native-like generation probability distribution. Additionally, LIgO assists users in determining the computational feasibility of their simulations. We show two examples where LIgO supports the development and validation of AIRR-ML methods: (i) how individuals carrying out-of-distribution immune information impacts receptor-level prediction performance and (ii) how immune information co-occurring in the same AIRs impacts the performance of conventional receptor-level encoding and repertoire-level classification approaches. LIgO guides the advancement and assessment of interpretable AIRR-ML methods.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。