Low-complexity and highly robust barcodes for error-rich single molecular sequencing

低复杂度、高鲁棒性的条形码,适用于高错误率的单分子测序

阅读:1

Abstract

DNA barcodes are frequently corrupted due to insertion, deletion, and substitution errors during DNA synthesis, amplification and sequencing, resulting in index hopping. In this paper, we propose a new DNA barcode construction scheme that combines a cyclic block code with a predetermined pseudo-random sequence bit by bit to form bit pairs, and then converts the bit pairs to bases, i.e., the DNA barcodes. Then, we present a barcode identification scheme for noisy sequencing reads, which uses a combination of cyclic shifting and traditional dynamic programming to mark the insertion and deletion positions, and then performs erasure-and-error-correction decoding on the corrupted codewords. Furthermore, we verify the identification error rate of barcodes for multiple errors and evaluate the reliability of the barcodes in DNA context. This method can be easily generalized for constructing long barcodes, which may be used in scenarios with serious errors. Simulation results show that the bit error rate after identifying insertions/deletions is greatly reduced using the combination of cyclic shift and dynamic programming compared to using dynamic programming only. It indicates that the proposed method can effectively improve the accuracy for estimating insertion/deletion errors. And the overall identification error rate of the proposed method is lower than 10-5 when the probability of each base mutation is less than 0.1, which is the typical scenario in third-generation sequencing.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。