microTaboo: a general and practical solution to the k-disjoint problem

microTaboo:k-不相交问题的通用实用解决方案

阅读:1

Abstract

BACKGROUND: A common challenge in bioinformatics is to identify short sub-sequences that are unique in a set of genomes or reference sequences, which can efficiently be achieved by k-mer (k consecutive nucleotides) counting. However, there are several areas that would benefit from a more stringent definition of "unique", requiring that these sub-sequences of length W differ by more than k mismatches (i.e. a Hamming distance greater than k) from any other sub-sequence, which we term the k-disjoint problem. Examples include finding sequences unique to a pathogen for probe-based infection diagnostics; reducing off-target hits for re-sequencing or genome editing; detecting sequence (e.g. phage or viral) insertions; and multiple substitution mutations. Since both sensitivity and specificity are critical, an exhaustive, yet efficient solution is desirable. RESULTS: We present microTaboo, a method that allows for efficient and extensive sequence mining of unique (k-disjoint) sequences of up to 100 nucleotides in length. On a number of simulated and real data sets ranging from microbe- to mammalian-size genomes, we show that microTaboo is able to efficiently find all sub-sequences of a specified length W that do not occur within a threshold of k mismatches in any other sub-sequence. We exemplify that microTaboo has many practical applications, including point substitution detection, sequence insertion detection, padlock probe target search, and candidate CRISPR target mining. CONCLUSIONS: microTaboo implements a solution to the k-disjoint problem in an alignment- and assembly free manner. microTaboo is available for Windows, Mac OS X, and Linux, running Java 7 and higher, under the GNU GPLv3 license, at: https://MohammedAlJaff.github.io/microTaboo.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。