Abstract
BACKGROUND: Traditional Chinese Medicine (TCM) constitution theory posits that constitution is dynamic, yet most research is cross-sectional. This study aims to bridge this gap by developing machine learning models using longitudinal cohort data to predict dynamic constitutional changes in the elderly. The objectives were twofold: to predict an individual's specific constitution type at a future time point and to classify the transformation trend between assessments. METHODS: This study utilized a large-scale longitudinal cohort, including 54,990 records from the TCM Elderly Constitution Questionnaire (TCMECQ) for model development and 2,181 records for external validation. Five machine learning models, including Support Vector Machine (SVM), Random Forest Classifier (RFC), Decision Tree (DT), K-Nearest Neighbors (KNN), and Neural Network (NN), were trained and validated. Model training and evaluation were performed using 10-fold cross-validation to ensure robustness. We defined two predictive outcomes: the specific future constitution type (nine-class) and the transformation trend ("better," "indeterminate," or "worse"), classified based on expert consensus. We also evaluated the predictive performance of 13 common biochemical indicators. RESULTS: For predicting the specific future constitution type, SVM demonstrated the best performance with an internal validation accuracy of 99.47%. In the core task of classifying the transformation trend, the RFC model was superior, achieving an accuracy of 96.92% (Kappa = 0.9505). However, all models performed poorly when predicting the "worse" trend in external validation. Analyses using biochemical indicators showed moderate performance for identifying "better" (accuracy >80%) and "indeterminate" (accuracy >90%) states but failed to predict the "worse" trend (accuracy <5%). CONCLUSION: This study establishes the first machine learning framework to effectively predict the dynamic evolution of TCM constitution. Our findings show that SVM excels at predicting specific future states, while RFC is more adept at capturing overall evolutionary trends. This framework provides a novel quantitative tool for advancing proactive health management aligned with the TCM principle of "Zhi Wei Bing". The models' inability to reliably predict constitutional deterioration highlights the critical need to incorporate multi-modal data to capture complex pathophysiological mechanisms in future research.