Abstract
BACKGROUND: Cardiovascular disease (CVD) is a major cause of morbidity and mortality among older adults living with HIV/AIDS (PLWHA). Accurate and early risk assessment in this special population remains challenging with conventional methods, whereas machine learning models demonstrate strong predictive capability. This study aimed to develop a machine learning–based prediction model to identify high-risk individuals and support early clinical intervention. METHODS: This retrospective study included 1916 PLWHA aged ≥50 years who received treatment at the Fourth People’s Hospital of Nanning between January 2011 and December 2021. Predictive features were selected using least absolute shrinkage and selection operator regression and incorporated into five commonly used machine learning algorithms to construct CVD risk prediction models. Model performance was evaluated and compared using metrics such as area under the curve (AUC), sensitivity, and specificity. Shapley Additive Explanations (SHAP) were used to analyze feature importance and further investigate interactions between variables. RESULTS: The crude incidence rate of CVD in this cohort of older PLWHA was 1759.27 per 100,000 person-years. Among the five machine learning models evaluated, the Light Gradient Boosting Machine (LightGBM) outperformed all other models with the highest AUC (94.8%) and excellent specificity (98.0%), along with a precision of 79.5% and an F1-score of 76.4%, highlighting its strong discriminatory power for identifying high-risk patients. SHAP analysis identified the five most influential features contributing to the LightGBM model’s predictions: duration of antiretroviral therapy, baseline CD4(+) T-cell count, baseline viral load, ART regimen, and triglyceride. Further interaction analysis revealed strong interactions between ART duration and virological, immunological, and metabolic factors. CONCLUSION: LightGBM is a promising machine learning model for predicting the risk of CVD in older PLWHA. It may serve as a valuable tool for the early identification of high-risk individuals and support more precise strategies for the prevention and management of CVD in this aging population. CLINICAL TRIAL NUMBER: Not applicable. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12879-025-12287-2.