Abstract
OBJECTIVES: To explore the correlation of tea consumption with risks of gastrointestinal diseases using a risk prediction model integrating interpretable machine learning and a large language model. METHODS: A survey was conducted among the patients undergoing both gastroscopy and 13C-urea breath testing at Gastrointestinal Endoscopy Center of Anxi Hospital of Traditional Chinese Medicine. Univariate analysis was performed to determine the suitability of feature selection. The collected data were randomly divided into training and testing sets in a 7:3 ratio. Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGB), and Deep Neural Network (DNN) were applied to identify the best classifier for predicting high-risk gastrointestinal conditions. Bayesian optimization algorithm was used to obtain the optimal hyperparameter combinations for the 6 models. After Model fitting, the interpretability of the best models was analyzed using SHapley Additive exPlanations (SHAP). The DeepSeek-R1 base language model was fine-tuned with gastrointestinal disease dataset and Chinese medical online consultation data to obtain the final model. RESULTS: The study included 503 participants. All the selected features showed association with gastrointestinal diseases, but only age exhibited a significant linear correlation (β=0.023, SE=0.008, t=2.942, P=0.003). DNN model performed the best with a good accuracy (0.68), precision (0.68), recall rate (0.85), F1 Score (0.75), and AUC (0.74). The top 3 important features were age, DOB value, and smoking history. The large language model constructed provided recommendations consistent with those of professional physicians based on gastroscopy results. CONCLUSIONS: DNN model is effective for predicting gastrointestinal disease risk and offers reliable support for clinical risk assessment and decision-making regarding endoscopy. Smoking cessation, moderate alcohol consumption, and reasonable tea intake may help prevent gastrointestinal diseases.