Abstract
Cervical cancer, predominantly caused by Human Papillomavirus (HPV) infection, remains a significant global health burden for women, contributing to elevated morbidity and mortality rates. Early and accurate prediction is critical in improving patient outcomes and optimizing healthcare resource allocation. While machine learning (ML) and deep learning (DL) methods-such as support vector machines, random forests, and convolutional neural networks-have demonstrated promise in disease prediction, model interpretability, computational efficiency, and rely on large, labeled datasets. Additionally, conventional diagnostic methods like piezoresistive, piezoelectric, and optical lever techniques are often cost-prohibitive and complex, limiting widespread use. This study proposes a hybrid ML framework that integrates H2O AutoML with an autoencoder-based feature extraction and Fisher Score-based feature selection. To enhance model transparency and clinical trust, Local Interpretable Model-Agnostic Explanations (LIME) and SHAP (SHapley Additive exPlanations) are employed. The workflow initiates with exploratory data analysis (EDA) and dimensionality reduction using a stacked autoencoder, followed by selection of the top predictive features via Fisher Score. The refined feature set is used to train multiple models via H2O AutoML, with the best-performing deep learning model selected. On the training dataset, the selected model achieved 95.24% accuracy, an AUC of 98.10, and a log loss of 0.1747. Cross-validation confirms the model's robustness with consistent AUC and log loss values. At the optimal F1 threshold of 0.517, the confusion matrix indicates an error rate of 5.75% for actual negatives and 2.59% for actual positives, leading to an overall error rate of 4.14%. LIME and SHAP are used to interpret predictions at the instance level, providing actionable insights for clinicians. These results demonstrate the effectiveness of combining AutoML with explainable AI and advanced feature engineering to enhance the predictive power and interpretability of cervical cancer risk models, offering a scalable solution for clinical decision support.