Abstract
INTRODUCTION: The saliva microbiome of oral squamous cell carcinoma (OSCC) patients has been gradually unveiled, but there is a lack of cross-cohort studies, and there is no non-invasive diagnostic model across cohorts for OSCC. METHODS: This study aimed to investigate the differences in saliva microbial composition between OSCC patients and healthy individuals using cross-cohort saliva microbiome data, comprising 354 healthy individuals and 311 OSCC patients (total n=665). RESULTS: We found significant differences in saliva microbial composition between OSCC patients and healthy people. Seven microorganisms were significantly reduced and seven were significantly increased in OSCC patients, serving as potential biomarkers. Machine learning models, including Random Forest, Extra Trees, Gradient Boosting, and XGBoost, were constructed to diagnose OSCC using saliva microorganisms. These models achieved area under the curve (AUC) values ranging from 63.1% to 96.9% at both genus and species levels in a rigorous leave-one-cohort-out cross-validation. DISCUSSION: Our study provides a robust non-invasive diagnostic model for OSCC and demonstrates that high diagnostic accuracy is achievable at both genus and species levels, suggesting that taxonomic resolution is not the primary limiting factor. Instead, the choice of model construction methods is crucial. Therefore, greater attention should be paid to the selection of model methods in clinical applications.