Abstract
OBJECTIVE: To explore the feasibility of establishing an auxiliary diagnostic model for secondary pulmonary bacterial infection in influenza using routine indicators from fever clinics. METHODS: A retrospective analysis of 510 influenza cases (divided into modeling and validation sets in a 7:3 ratio) was conducted, with an additional 72 cases selected for external validation. Logistic regression was adopted as the traditional diagnostic model, while two machine learning models (decision tree and random forest) were constructed using R4.2.3 software. The diagnostic performance of each model was evaluated using multiple indicators, including accuracy, sensitivity, specificity, positive predictive value, negative predictive value, receiver operating characteristic curve, and area under the curve (AUC). RESULTS: Among the 357 influenza patients in the modeling set, 101 developed secondary pulmonary bacterial infection, while 256 did not. Multivariate logistic regression analysis showed that age, white blood cell count, C-reactive protein, serum amyloid A, creatine kinase isoenzyme, and D-dimer were independent risk factors for secondary pulmonary bacterial infection (all P<0.05). In the modeling set, validation set, and external validation set, the machine learning model generally outperformed the logistic regression model in all diagnostic performance metrics. The random forest model performed exceptionally well on all three datasets, with AUC values of 0.951, 0.902, and 0.852, respectively. CONCLUSION: In auxiliary diagnostic models constructed based on routine fever clinic testing indicators, machine learning models, especially the random forest model, demonstrate high diagnostic accuracy and good generalization ability for influenza-related secondary pulmonary bacterial infection.