Abstract
MOTIVATION: DNA copy number variations (CNVs) exert a profound impact on major genetic disorders in humans. Although multiple sequencing technologies have become the first line of molecular diagnosis for CNVs, existing tools are unable to resolve the pathogenicity of CNVs directly from raw sequencing data. RESULTS: We developed CNVSeeker, a one-stop and easy-to-use pipeline that provides comprehensive analysis from raw sequencing data to variant interpretation reports, and supports multiple types of sequencing data including short-read data such as whole genome sequencing data and whole exome sequencing data, and long-read sequencing data from Pacific Biosciences HiFi platform or Oxford Nanopore Technologies platform. Through extensive benchmarking, CNVSeeker demonstrated comparable enhancement over the state-of-the-art methods for CNV calling. Moreover, CNVSeeker enables significantly precise variant classification with an accuracy of ∼87%. By applying CNVSeeker to 1946 individuals with autism spectrum disorder (ASD), a total of 133 ASD-associated CNVs in 122 patients were identified, yielding a diagnostic yield of ∼6.3%. Additionally, we have also provided a user-friendly webserver for intuitive visualization of results. This study highlights the potential of CNVSeeker to benefit clinicians and geneticists with limited bioinformatic skill by aiding them interpret CNVs directly from various types of raw sequencing data for auxiliary disease diagnosis. AVAILABILITY AND IMPLEMENTATION: The web server is freely available at https://genemed.tech/cnvseeker and the open-source code can be found at https://github.com/lovelycatZ/CNVSeeker.