Abstract
The prompt and precise identification of cervical cancer via cytology screening is essential for decreasing mortality; yet, traditional manual microscopy is impeded by subjectivity, operator tiredness, and limited throughput, resulting in diagnostic discrepancies. Conventional deep learning models have been investigated for automation; however, these methods frequently encounter difficulties in reconciling accurate spatial boundary segmentation with effective global contextual reasoning, thereby constraining their clinical precision and dependability. This paper introduces SegResDeiT, an innovative hybrid framework designed for the concurrent segmentation and classification of cervical cytology images. Our model incorporates a SegNet backbone for precise pixel-wise segmentation, a ResNet-50 encoder for hierarchical feature extraction, and a Data-efficient Image Transformer (DeiT) head for enhanced global context modelling and classification. The proposed model underwent thorough evaluation against leading benchmarks, achieving exceptional performance with an accuracy of 94.47%, precision of 95.66%, recall of 96.47%, F1-score of 96.06%, and outstanding segmentation quality, as demonstrated by a Dice coefficient of 96.06% and an IoU of 92.43%. An ablation investigation validated the collaborative impact of each architectural element, while a computational analysis illustrated a feasible equilibrium between superior performance and practical inference duration. The results unequivocally indicate that SegResDeiT outperforms current methodologies, providing a reliable and effective alternative likely to improve the precision and availability of automated cervical cancer screening, with considerable prospects for practical use.