Abstract
High-resolution histology images are indispensable for pathology and increasingly serve as the structural backbone for spatial omics. Yet whole-slide images (WSIs) frequently contain artifacts, acellular voids, and background regions that, when included in computational workflows, introduce noise, degrade model accuracy, and compromise biological interpretation. Existing tools provide only coarse foreground-background separation, leaving a gap in fine-grained quality control (QC). Here we present HistoSweep, a scalable framework that generates morphology-aware tissue masks at cellular resolution. By integrating density filtering, texture descriptors, and adaptive thresholding, HistoSweep systematically removes non-informative tissue regions while preserving biologically meaningful microstructures. It processes billion-pixel WSIs in minutes on standard CPUs, requiring no GPU acceleration, and is deployable across research and clinical settings. Across 25 WSIs spanning distinct tissues, disease states, and spatial omics platforms, HistoSweep consistently outperformed existing methods. It enhanced visualization and segmentation, improved virtual cell type predictions, and safeguarded spatial transcriptomics integrity by detecting transcript leakage and transcript-histology misalignment. By enabling fine-grained, scalable QC, HistoSweep provides a foundational preprocessing step for reliable and reproducible digital pathology and spatial omics analyses.