Abstract
The price, quality, and throughput of DNA sequencing continue to improve. Algorithmic innovations have allowed inference of a growing range of features from DNA sequencing data, quantifying nuclear, mitochondrial, and evolutionary aspects of both germline and somatic genomes. To automate analyses of the full range of genomic characteristics, we created an extensible Nextflow metapipeline called metapipeline-DNA. It analyzes targeted and whole-genome sequencing data from raw reads through preprocessing, feature detection by multiple algorithms, quality control, and data-visualization. Each step can be run independently and is supported by robust software engineering including automated failure-recovery, granular testing, and consistent verifications of inputs, outputs, and parameters. Metapipeline-DNA is cloud-compatible and highly configurable, with options to subset and optimize each analysis. Metapipeline-DNA facilitates high-scale, comprehensive analysis of DNA sequencing data, and is open-source under the GPLv2 license.