SoMaCX: a complex generative genome modeling framework

SoMaCX:一个复杂的生成式基因组建模框架

阅读:2

Abstract

BACKGROUND: Somatic structural variations (SVs) are commonly observed in cancer tissue, but remain challenging to discover with short and long read sequencing due to tumor heterogeneity and other technical sequencing factors. Only SVs with a sufficient fraction of reads spanning the event will be detectable, while issues like chromothripsis increase the complexity and resulting interpretation significantly. Because structural variation is difficult to measure and reproduce in vivo, it is logical to make use of simulation frameworks to determine realistic system limitations. Our generative modeling approach called soMaCX uses distributions from data to empower simulations that approach real data. RESULTS: Our generative framework includes mechanisms for biological conservation in the germline as well as tissue composition in the somatic along with regional distribution controls and complex SV generation that is not available in other systems. The output of this system is FASTA format which can then be used as input to any downstream read simulator making Illumina, PacBio, 10X genomics, Oxford-Nanopore and Bionano FASTQ data files which are further processed to become standard BAM files for SV calling. CONCLUSIONS: The soMaCX framework provides superior generative modeling-based performance when compared to other simulation frameworks with respect to real data. Our open-source method introduces an important conceptual element to simulation by utilizing biological relevant regions (genes and regulatory elements) as the distribution controls along with the biological modulation of known pathways (end-joining) leading to more detailed and realistic simulated genomes. By designing a generative method to explore the most difficult genomic conditions, we provide a means to measure germline variation calling performance and to calibrate the results for rare variants needed in the clinical setting. We provide a python 3 implementation at: https://github.com/timothyjamesbecker/somacx .

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。