Abstract
16S rRNA amplicon sequencing has been an effective method for profiling microbial taxonomy in microbiome research, as it offers lower per-sample costs and higher sample throughput than shotgun metagenomics. Although 16S rRNA sequencing offers clear advantages over shotgun sequencing, it depends on precise trimming of low-quality bases at the 3' ends of reads. Given the widespread use of 16S rRNA amplicon sequencing, there is an increasing demand for analysis tools that can identify errors in the 3' region of reads and remove erroneous bases. While various algorithms for predicting trim locations are widely employed, most are command-line standalone tools, which pose challenges for users with limited computational background or resources. Furthermore, in the absence of biological or experimental priors such as amplicon size, trim position predictions may be unreliable. Here, we introduce PixelCut, a fully automated trim-position prediction framework that requires no hyperparameters or prior biological information for accurate prediction. Unlike most available algorithms that operate on raw FASTQ data, PixelCut analyzes the per-base quality report generated by FastQC to infer trimming positions. Based on the recommended quality score threshold from the quality report, PixelCut inspects the quality scores across bases and automatically determines the recommended trim position using character recognition techniques based on computer vision. We have also developed a user-friendly web application to make the method accessible to those without programming expertise, while offering a command-line version for advanced users. Through comprehensive computer simulations, we show that PixelCut produces taxonomic profiling results that are consistent with those from popular trim-location prediction algorithms.