Abstract
Single-cell RNA-seq studies of viral infection are limited by sparse viral reads, under-labeled infected cells, and bystander responses that confound differential expression (DE) analysis. We introduce scDEcrypter, a penalized two-way mixture model that leverages partial labels for infection status and additional variables such as cell type. Our approach employs data-splitting to avoid double-dipping and enables fast, likelihood-based inference for DE analysis. Through simulations and applications on two different viral infection datasets, scDEcrypter demonstrated improved recovery of infected cell states and identified more biologically coherent infection-associated genes and enriched pathways.