Abstract
BACKGROUND: Declining sequencing costs coupled with the increasing availability of easy-to-use kits for the isolation of DNA and RNA transcripts from single cells have driven a rapid proliferation of studies centered around genomic and transcriptomic data. Simultaneously, a wealth of new techniques have been developed that utilize single cell technologies to interrogate a broad range of cell-biological processes. One recently developed technique, transposase-accessible chromatin with sequencing (ATAC) with select antigen profiling by sequencing (ASAPseq), provides a combination of chromatin accessibility assessments with measurements of cell-surface marker expression levels. While software exists for the characterization of these datasets, there currently exists no tool explicitly designed to reformat ASAP surface marker FASTQ data into a count matrix which can then be used for these downstream analyses. RESULTS: To address this lack of a dedicated tool for ASAPseq data processing, we created CountASAP, an easy-to-use Python package purposefully designed to transform FASTQ files from ASAP experiments into count matrices compatible with commonly-used downstream bioinformatic analysis packages. CountASAP takes advantage of the independence of the relevant data structures to perform fully parallelized matches of each sequenced read to user-supplied input ASAP oligos and unique cell-identifier sequences. We directly compare the performance and user-friendliness of CountASAP to existing tools using similarly-structured data from a more common sequencing experiment: cellular indexing of transcriptomes and epitopes by sequencing (CITEseq). Further benchmarking against existing tools helps to identify proper defaults for CountASAP and assess the agreement of outputs from all tested software. A final test using a novel ASAPseq dataset provides evidence that CountASAP can generate biologically meaningful results that correlate well with paired chromatin accessibility data. CONCLUSIONS: CountASAP shows good agreement with existing, well-tested data processing tools in the analysis of similarly-structured benchmarking data. CountASAP runs efficiently on a standard laptop, has user-friendly documentation, a one-step installation, and represents the first and only tool designed specifically for the processing of ASAPseq data.