Abstract
BACKGROUND: Allergic rhinitis (AR) in children is a common condition with rising prevalence globally, causing a substantial negative impact on patient quality of life and an economic burden. While Western medicine provides symptom relief, recurrence rates and side effects remain concerns. Traditional Chinese medicine (TCM), through syndrome differentiation, offers an effective, affordable alternative. However, clinical diagnosis in TCM often relies on subjective judgment. Digital tongue image analysis, combined with clinical symptoms and medical history, may enhance the accuracy and objectivity of syndrome differentiation, offering a promising approach to more effective treatment for pediatric AR. This study aimed to assist clinicians in accurately distinguishing between cold and heat syndromes in pediatric patients with AR. METHODS: A total of 391 children with AR were included in this study. Patients were classified with cold syndrome (n=92) or heat syndrome (n=299). Patients were randomly divided into a training set (n=176) and a test set (n=215). A multimodal deep learning model was developed with three stages. First, a hybrid Dense Convolutional Network model with a Squeeze-and-Excitation (SE-DenseNet) module was used to extract features from tongue images. Second, the independent sample t-test was used to screen and select relevant features from patient demographic and clinical information and patient and family medical history. Third, a transformer model was used to integrate the features for cold and heat syndrome classification. Model performance was evaluated using area under the curve (AUC), accuracy, precision, recall, and F1 scores. RESULTS: The multimodal model outperformed other models when classifying children with AR as cold syndrome or heat syndrome. It had the best AUC, accuracy, precision, recall, and F1 score. In the training set, the AUC, accuracy, precision, recall, and F1 score were 0.931, 0.875, 0.949, 0.869, and 0.920, respectively. In the test set, the AUC, accuracy, precision, recall, and F1 score were 0.877, 0.856, 0.863, 0.829, and 0.910, respectively. CONCLUSIONS: The multimodal model integrating clinical features and features from tongue images demonstrated high accuracy, with potential to assist pediatricians in syndrome differentiation and treatment decision-making for children with AR. The multimodal model may enable objective and quantifiable diagnostic results, improving efficiency and accuracy.