Abstract
Cell Painting (CP), as a high-throughput imaging technology, generates extensive cell-stained imaging data, providing unique morphological insights for biological research. However, CP data contains three types of technical effects, referred to as triple effects, including batch effects, gradient-influenced row and column effects (well-position effects). The interaction of various technical effects can obscure true biological signals and complicate the characterization of CP data, making correction essential for reliable analysis. Here, we propose cpDistiller, a triple-effect correction method specially designed for CP data, which leverages a pre-trained segmentation model coupled with a semi-supervised Gaussian mixture variational autoencoder utilizing contrastive and domain-adversarial learning. Through extensive qualitative and quantitative experiments across various CP profiles, we demonstrate that cpDistiller effectively corrects triple effects, especially well-position effects, while preserving cellular heterogeneity. Moreover, cpDistiller effectively captures system-level phenotypic responses to genetic perturbations and reliably infers gene functions and interactions both when combined with scRNA-seq data and independently. cpDistiller also demonstrates promising capability for identifying gene and compound targets, highlighting its potential utility in drug discovery and broader biological research.