Abstract
BACKGROUND: Oral leukoplakia (OL) is a potentially malignant disorder of the oral mucosa. Accurate prediction of malignant transformation (MT) remains a clinical challenge. This study aimed to develop and evaluate a machine learning model that integrates histopathological, demographic, and DNA content features to predict MT risk in OL. METHODS: We conducted a retrospective cohort study of 97 OL cases-18 with confirmed MT and 79 non-transformed controls-selected from a larger series. Each case included clinicopathological features, and DNA content data obtained by flow cytometry for cell cycle phases (G1, S-phase, G2 and excess DNA beyond the tetraploid region [4cER]). All cases had a minimum 5-year follow-up or histologically confirmed transformation. A multilayer perceptron (MLP) model was trained on 27 features. Stratified five-fold cross-validation and minority class oversampling (positive filling) were used to improve learning and mitigate data imbalance. Performance was evaluated using accuracy, sensitivity, specificity, F1-score, AUC, and Kaplan-Meier survival analysis. RESULTS: Significant predictors of MT included 4cER (p = 0.005), G2 phase (p = 0.04), dysplasia grading (p = 0.003), and inflammatory infiltrate (p = 0.01). The optimized model yielded 72% sensitivity, 96% specificity, and an AUC of 85.4%. Survival analysis showed significantly poorer outcomes in the high-risk cases predicted by the model (p < 0.0001). CONCLUSION: Integrating DNA content analysis with machine learning provides an objective and clinically useful model to stratify malignant risk in OL, complementing conventional histopathology and supporting personalized patient management.