FM3VCF: A software library for accelerating the loading of large VCF files in genotype data analyses

FM3VCF:用于加速基因型数据分析中大型VCF文件加载的软件库

阅读:2

Abstract

The increasing size of genotype data has led to the loading of VCF files becoming a computational bottleneck in various analyses, including imputation and genome-wide association studies (GWAS). To address this issue, we developed a software library, FM3VCF (fast M3VCF), that utilizes multiple CPU threads to accelerate this process and compress VCF files into the more compact M3VCF format. FM3VCF can convert VCF files into the exclusive data format of MINIMAC4 and M3VCF and can efficiently read and parse data from VCF files. Compared with m3VCFtools, FM3VCF exhibits a speed improvement of approximately 36-fold in the compression of VCF files to the M3VCF format. This acceleration addresses a limitation faced by MINIMAC4 when dealing with datasets containing millions of samples. Furthermore, FM3VCF is approximately 3 times faster than HTSlib, including decompressing and parsing, for reading compressed VCF files. FM3VCF is an effective tool for both compressing VCF files efficiently and accelerating the loading of large VCF files in genotype data analyses. By fully utilizing multiple CPU threads, FM3VCF can significantly reduce the computational burden of various genomic analyses.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。