Abstract
Cervical cancer remains a leading cause of mortality among women, particularly in resource-limited settings where cytological screening is hindered by manual analysis challenges. This study presents a joint segmentation and classification framework integrating U-Net and a CNN ensemble to automate and enhance the accuracy of cervical cell analysis. The U-Net architecture is employed to delineate nuclei and cytoplasmic boundaries from Pap smear images, leveraging skip connections to preserve spatial resolution in complex cellular environments. Following segmentation, an ensemble classifier combining ResNet-50, VGG-16, and InceptionV3 is utilised to differentiate cervical cell types with high precision. The proposed method was evaluated on benchmark datasets (APACS-23, Cx22, SIPaKMeD) using stratified five-fold cross-validation and external validation cohorts. Experimental results achieved a mean Dice coefficient of 0.934 for segmentation and a classification accuracy of 94.6%, outperforming single-model baselines by 5–8% in F1-score and AUC metrics. Grad-CAM visualisations demonstrated that the model’s focus corresponded to diagnostically relevant regions, reinforcing clinical interpretability. Ablation studies confirmed that the joint learning architecture substantially enhanced feature localisation and classification robustness. This research contributes a deployable, interpretable AI-based pipeline suitable for real-time cervical cancer screening, offering potential deployment in both centralised laboratories and low-resource healthcare environments. Future directions include expanding dataset diversity, integrating transformer-based models, and developing lightweight implementations for edge devices.