Discovering single nucleotide variants and indels from bulk and single-cell ATAC-seq

从批量和单细胞ATAC-seq数据中发现单核苷酸变异和插入缺失。

阅读:1

Abstract

Genetic variants and de novo mutations in regulatory regions of the genome are typically discovered by whole-genome sequencing (WGS), however WGS is expensive and most WGS reads come from non-regulatory regions. The Assay for Transposase-Accessible Chromatin (ATAC-seq) generates reads from regulatory sequences and could potentially be used as a low-cost 'capture' method for regulatory variant discovery, but its use for this purpose has not been systematically evaluated. Here we apply seven variant callers to bulk and single-cell ATAC-seq data and evaluate their ability to identify single nucleotide variants (SNVs) and insertions/deletions (indels). In addition, we develop an ensemble classifier, VarCA, which combines features from individual variant callers to predict variants. The Genome Analysis Toolkit (GATK) is the best-performing individual caller with precision/recall on a bulk ATAC test dataset of 0.92/0.97 for SNVs and 0.87/0.82 for indels within ATAC-seq peak regions with at least 10 reads. On bulk ATAC-seq reads, VarCA achieves superior performance with precision/recall of 0.99/0.95 for SNVs and 0.93/0.80 for indels. On single-cell ATAC-seq reads, VarCA attains precision/recall of 0.98/0.94 for SNVs and 0.82/0.82 for indels. In summary, ATAC-seq reads can be used to accurately discover non-coding regulatory variants in the absence of whole-genome sequencing data and our ensemble method, VarCA, has the best overall performance.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。