Abstract
Histology images offer a cost-effective approach to predicting cellular phenotypes using spatial transcriptomics. However, existing methods struggle with individual gene expression accuracy and lack the capability to predict fine-grained transcriptional cell types. We present Hist2Cell, a vision graph-transformer framework to accurately resolve fine-grained cell types directly from histology images. Trained on human lung and breast cancer datasets, Hist2Cell predicts cell-type abundance with high accuracy (Pearson correlation over 0.80) and captures cellular colocalization. Moreover, it generalizes to large-scale The Cancer Genome Atlas (TCGA) cohorts without re-training, facilitating survival prediction by revealing distinct tissue microenvironments and cell type-patient mortality relationships. Thus, Hist2Cell enables cost-efficient analysis for large-scale spatial biology studies and precise cancer prognosis.