Abstract
BACKGROUND AND OBJECTIVES: We aimed to identify and optimize contributing factors associated with allergic diseases by machine/deep learning algorithms among school-age children aged 6-14 years. METHODS: We performed a cross-sectional survey in eight primary schools and 16 middle schools using a clustering sample strategy. Data were collected by questionnaires. Machine/deep learning algorithms were implemented using Python (v3.7.6). RESULTS: Of 11308 children enrolled, 4375 had allergic diseases. The prevalence of asthma, allergic rhinitis and eczema was 6.31% (N=713), 25.36% (N=2868) and 21.38% (N=2418), respectively. Of 12 machine-learning algorithms, Gaussian naive Bayes (NB) outperformed the others for asthma, Bernoulli NB for rhinitis and multinomial NB for eczema. By comparison, a minimal set of six, five and five key factors were identified for asthma (episodes of upper and lower respiratory infection, age, gender, family history of diabetes and dental caries), rhinitis (episodes of upper respiratory infection, age, gender, maternal education and family history of diabetes) and eczema (episodes of upper respiratory infection, age, maternal education, outdoor activities and dental caries), respectively. CONCLUSIONS: We identified three minimal sets of factors that can capture the majority of whole information and accurately predict the risk for asthma, rhinitis and eczema among children aged 6-14 years.