Abstract
We conducted a cross-sectional oral cancer screening study in Northeast India to develop and validate an oral precancer/cancer risk prediction model. We compared epidemiologic profiles between tobacco pouch keratosis and oral precancer/cancer. During 2018-2022, we recruited 14,749 participants who underwent an interviewer-administered questionnaire and oral examination (visual inspection and autofluorescence). Logistic regression was used to compare risk factors between tobacco pouch keratosis and precancer/cancer and risk model development for prevalent lesions (keratosis and oral precancer/cancer, combined). Model validation was conducted internally and externally (Kerala oral cancer screening trial). Among the 14,749 participants, as per dentists' diagnosis, 1365 lesions were identified. These included 249 benign lesions (prevalence = 1.6%), 795 tobacco pouch keratosis (prevalence = 5.4%), and 321 precancers/cancers (prevalence = 2.2%). Agreement between dentists and health workers was high for visual diagnosis of prevalent lesions (keratotic/precancer/cancer; positive-agreement = 87.5%; kappa = 0.77; 95% confidence interval [CI] = 0.75-0.78). Risk factor profiles were similar between tobacco pouch keratosis and oral precancer/cancer. The risk prediction model (based on age, sex, education, income, chewing duration, chewing type, smoking duration and intensity, alcohol duration and intensity) had good discrimination (area under the curve [AUC] = 0.83) and calibration (E/O ratio = 1.00) internally. Further, 30% of individuals at the highest model-predicted risk accounted for 81.8% of prevalent lesions. However, in external validation, the risk model had modest discrimination (AUC = 0.67; 95% CI = 0.66-0.68) and poor calibration (E/O ratio = 0.52; 95% CI = 0.50-0.54). Our results suggest tobacco pouch keratosis as an early carcinogenic event amenable for behavioral interception. Poor transportability of our risk model reflects the need for prediction models that account for geographic differences in risk factors within regions in India.