Abstract
Single-cell RNA sequencing (scRNA-seq) enables high-resolution analysis of gene expression at the individual cell level, with clustering serving as a critical step for identifying distinct cell populations. Due to the high dimensionality and sparsity of scRNA-seq data, existing approaches typically perform gene selection prior to clustering. However, treating feature selection as a separate preprocessing step can overlook latent clustering structure and often results in suboptimal outcomes, as it does not guarantee that the selected genes are informative for clustering. To address this limitation, we propose FSSC (Feature Selection for scRNA-seq Clustering), a unified framework for joint feature selection and clustering in scRNA-seq analysis. FSSC integrates a zero-inflated negative binomial (ZINB) autoencoder with a group Lasso penalty and a dedicated clustering loss. This joint optimization enables the model to simultaneously learn low-dimensional representations and select a compact set of cluster-discriminatory genes, preserving both the statistical characteristics of scRNA-seq data and its underlying cluster structure. Extensive experiments on both simulated and real scRNA-seq datasets demonstrate that FSSC consistently outperforms state-of-the-art methods in clustering accuracy and effectively identifies a compact, biologically meaningful set of marker genes.