Abstract
BACKGROUND: Cardiovascular disease (CVD) is a leading cause of morbidity and mortality among postmenopausal women in rural India, where healthcare resources remain limited. OBJECTIVES: This study aimed to leverage artificial intelligence (AI) and machine learning (ML) approaches to predict CVD risk in rural elderly women, identify key clinical predictors, and assess model performance using interpretable AI tools. METHODS: This observational cross-sectional study was conducted in Singur Block (West Bengal) and Amdanga Block (North 24 Parganas District) between March 2014 and August 2018. Data from 458 rural postmenopausal women were analyzed. The outcome variable was the presence or absence of elevated cardiovascular disease risk, defined using composite International Diabetes Federation and American Heart Association criteria. Predictors included waist circumference, blood pressure, fasting blood glucose, HDL cholesterol, triglycerides, and vitamin D concentrations. Seven ML models [Random Forest, Gradient Boosting, Ensemble (Voting Classifier), Extra Trees, Support Vector Machine, Neural Network, and Logistic Regression] were developed and compared. Model evaluation employed 5-fold cross-validation with metrics including accuracy, AUC, precision, recall, and F1 score. RESULTS: Among the 458 participants, 171 (37.3%) exhibited elevated CVD risk. The Random Forest model achieved an accuracy of 98.91% (95% CI: 97.8%, 99.6%), whereas eXtreme Gradient Boosting (XGBoost) demonstrated comparable performance with an AUC of 0.998 (95% CI: 0.993, 1.000), precision of 97.2%, and recall of 98.3%. Feature-importance analysis revealed waist circumference, blood pressure, and fasting glucose as the strongest predictors, with HDL cholesterol and vitamin D contributing modestly but significantly. CONCLUSIONS: ML models-particularly Random Forest and XGBoost-demonstrated high accuracy and interpretability in predicting CVD risk among rural postmenopausal women. These findings highlight the potential of AI-driven, low-cost predictive tools for early CVD risk detection and personalized preventive healthcare in resource-limited rural settings.