Abstract
BACKGROUND: Low-pass genome sequencing (LP GS) has been widely used for the detection of copy number variations (CNVs). As a key algorithmic parameter of LP GS, window selection may influence the performance of LP GS. However, limited studies have investigated this parameter for the detection of small CNVs. METHODS: To evaluate of the impact of sliding window on true positive rate, additional interpretation workload and resolution, 40 simulated samples with 19 pre-defined CNVs of various read amounts were simulated. Fifty-seven clinical cases with previously ascertained CMA results (27 positive cases and 30 negative cases) were used to further evaluate the influence of sliding window for detection sensitivity and specificity. RESULTS: In general, the true positive rate increased with the increase of sequencing depth for simulated samples. The algorithm by sliding a 10-Kb window in 1-Kb increments showed higher true positive rate, especially for CNVs < = 30 Kb. For deletions of 30 Kb, the algorithm by sliding a 10-Kb window in 1-Kb increments showed a true positive rate of 100% for all read amounts, while the algorithm by sliding a 50-Kb window in 5-Kb increments had a detection sensitivity of 80.0% even with 100 M read amount. The results of overlap analysis showed that the algorithm by sliding a 10-Kb window in 1-Kb increments showed less variability for both deletions and duplications (especially for CNVs < = 30 Kb), indicating higher detection resolution. Further combining the potential introduction of the additional interpretation workload by 10-Kb window in 1-Kb increments, 50 M reads is recommended for detecting most small CNVs. For the 57 clinical cases, the algorithm by sliding a 50-Kb window in 5-Kb increments and the algorithm by sliding a 10-Kb window in 1-Kb increments showed detection sensitivity of 85.19% (23/27) and 96.30% (26/27), respectively. The algorithm by sliding a 10-Kb window in 1-Kb increments detected all the CNVs missed by sliding a 50-Kb window in 5-Kb increments except for one 25.8 Kb deletion. The specificity for both algorithms was calculated as 96.67% (29/30). CONCLUSION: Window selection, together with sequencing depth, could influence CNV detection sensitivity and resolution of LP GS for small CNVs. This study provided a set of evaluation methods and pathways based on simulated samples and clinical cases. For CNVs < = 30 kb, 10-Kb window in 1-Kb increments and >= 50 M reads were recommended for LP GS. It would be advisable for clinical labs conducting LP GS to determine the range of sensitivity and resolution for different sliding windows and sequencing depth for CNV detection.