Pattern detection and string matching are fundamental problems in computer science and the accelerated expansion of bioinformatics and computational biology have made them a core topic for both disciplines. The requirement for computational tools for genomic analyses, such as sequence alignment, is very important, although, in most cases the resources and computational power required are enormous. The presented Multiple Genome Analytics Framework combines data structures and algorithms, specifically built for text mining and (repeated) pattern detection, that can help to efficiently address several computational biology and bioinformatics problems, concurrently, with minimal resources. A single execution of advanced algorithms, with space and time complexity O(nlogn), is enough to acquire knowledge on all repeated patterns that exist in multiple genome sequences and this information can be used as input by meta-algorithms for further meta-analyses. For the proof of concept and technology of the proposed Framework scalability, agility and efficiency, a publicly available dataset of more than 300,000 SARS-CoV-2 genome sequences from the National Center for Biotechnology Information has been used for the detection of all repeated patterns. These results have been used by newly introduced algorithms to provide answers to questions such as common patterns among all variants, sequence alignment, palindromes and tandem repeats detection, different organism genome comparisons, polymerase chain reaction primers detection, etc.
Multiple genome analytics framework: The case of all SARS-CoV-2 complete variants.
阅读:4
作者:Xylogiannopoulos, Konstantinos, F
| 期刊: | Journal of Biotechnology | 影响因子: | 3.900 |
| 时间: | 2022 | 起止号: | 2022 Nov 20; 359:130-141 |
| doi: | 10.1016/j.jbiotec.2022.09.015 | ||
特别声明
1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。
2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。
3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。
4、投稿及合作请联系:info@biocloudy.com。
