Abstract
BACKGROUND: High-grade cervical intraepithelial neoplasia (CIN2/3) is a critical precursor to cervical cancer, yet current screening methods (e.g., HPV testing, colposcopy) face challenges in accessibility and invasiveness, especially in resource-limited settings. We aimed to develop a non-invasive, machine learning (ML)-based model using routine blood biomarkers. This model is intended to assess the risk of high-grade CIN and potentially serve as a triage tool before colposcopy. METHODS: Data were collected from two groups: 128 high-grade CIN (CIN2/3) and 120 low-grade CIN (CIN1) patients. A total of 29 clinical characteristics and blood test measurements were considered for use in model development. Four feature selection algorithms (F-test, LASSO regression, decision tree, and random forest) were used to identify key predictors, and 11 machine learning algorithms were employed for model training. The dataset was split into training (70%) and testing (30%) cohorts. Model performance was evaluated using learning curves, receiver operating characteristic curves (ROC), area under the curve (AUC), Brier score, calibration curves, Precision-Recall (PR) curves, and Decision Curve Analysis (DCA). A web-based calculator was developed for clinical deployment. We assessed feature importance using the SHapley Additive exPlanation (SHAP) approach. RESULTS: Key features selected for the model included creatinine (CREA), red blood cell count (RBC), neutrophil ratio (NEU%), direct bilirubin (DBIL), and monocyte count (MON). The Support Vector Machine (SVM) algorithm achieved the best predictive performance, with an AUC of 0.75 (95% CI: 0.69–0.80) and a Brier score of 0.21 (95% CI: 0.17–0.28). By employing the SHAP method, we identified the variables that contributed to the model. The web tool (https://dvhl6xsf29zmdewixjx7kz.streamlit.app) provides real-time risk stratification. CONCLUSIONS: The model demonstrated strong performance across various validation metrics, with the SVM algorithm achieving an AUC of 0.75, indicating potential clinical utility. We also developed a web-based calculator to estimate high-grade CIN. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-025-03321-z.