Abstract
The polymerase chain reaction (PCR) amplification process of deoxyribonucleic acid (DNA) libraries can introduce bias in the sequence ratios. Consequently, several recent genomic and transcriptomic methods employing next-generation sequencing (NGS) utilize in vitro transcription (IVT) to amplify template polynucleotide chains. IVT amplifies nucleic acid sequences linearly, making it less susceptible to bias than the exponential amplification of PCR. Chromatin integration labeling sequencing (ChIL-seq), a tool for analyzing transcription factor binding and histone modifications, has incorporated IVT by replacing PCR in the DNA amplification step, enabling the analysis of small sample sizes, including single cells. In this study, we discovered that many of the excluded sequences known as PCR duplicates during the pre-processing step of ChIL-seq data analysis contain amplification products derived from IVT. Furthermore, we developed an in silico method to selectively eliminate PCR duplicates from NGS data while retaining IVT-derived amplification products. The method prevents excessive data reduction and significantly improves the utilization efficiency of NGS data.