Harvesting more reads from single-cell combinatorial barcoding data with scarecrow

利用稻草人从单细胞组合条形码数据中获取更多读取数据

阅读:1

Abstract

SUMMARY: Combinatorial barcoding technologies for single-cell nucleotide sequencing, such as split-pool ligation protocols, involve sequential rounds of cell barcoding to uniquely tag individual cells. The rapid adoption of combinatorial barcoding in recent years is due in part to its scalability across cells and samples. However, small shifts in barcode positions within sequencing reads caused by technical artifacts, e.g. during barcode incorporation or synthesis, can impact the accurate assignment of reads to cell barcodes. Existing processing tools typically assume barcodes contain fixed-length nucleotide sequences located at fixed positions within reads, overlooking any positional variability. Consequently, reads containing truncated or mispositioned barcodes are discarded during initial data processing steps leading to significant data loss. To solve this limitation and maximize the retention of sequencing reads from single-cell combinatorial barcoding experiments, we introduce scarecrow. This tool screens a subsample of reads to generate position-specific barcode profiles, which are then used to flexibly identify barcode sequences in each read whilst accounting for positional errors, a phenomenon we refer to as "jitter". Barcode matches are then prioritized to minimize nucleotide mismatches and the degree of jitter. These initial profiles are subsequently used to extract and error correct barcode combinations in high throughput sequencing libraries. By incorporating jitter into barcode error correction, scarecrow enables greater data recovery and improved downstream single-cell analyses. Scarecrow is fully open access, implemented in Python, and generates output files using standardized sequence file formats for maximal interoperability. A detailed explanation of the scarecrow workflow can be found in the supplementary materials. AVAILABILITY AND IMPLEMENTATION: Scarecrow is freely available on GitHub https://github.com/MorganResearchLab/scarecrow and Zenodo https://doi.org/10.5281/zenodo.18621784.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。