FastGA: fast genome alignment

FastGA:快速基因组比对

阅读:1

Abstract

MOTIVATION: FastGA finds alignments between two genome sequences more than an order of magnitude faster than previous methods that have comparable sensitivity. Its speed is due to (i) a fully cache-local architecture involving only MSD radix sorts and merges, (ii) an algorithm for finding adaptive seed hits in a linear merge of sorted k-mer tables, and (iii) a variant of the Myers adaptive wave algorithm to find alignments around a chain of seed hits. It further stores alignments in a fraction of the space of a conventional CIGAR string using a trace-point encoding and our ONEcode data system introduced here. RESULTS: For example, two 2 Gbp bat genomes are compared in 2.1 min with eight threads on an Apple laptop using 5.7 GB of memory and producing 1.05 million alignments covering 60% of each genome. Our ALN format file occupies 66 MB and in just 6 s can be converted to a standard 1.03 GB PAF file. AVAILABILITY AND IMPLEMENTATION: FastGA is freely available at GitHub: http://www.github.com/thegenemyers/FASTGA along with utilities for viewing inputs, intermediates, and outputs and transforming ALN files to PSL or PAF with or without CIGAR strings and common formats. There is also a utility to chain FastGA's alignments and display them in a dot-plot view in PostScript files.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。