Abstract
Community-acquired pneumonia (CAP) remains one of the leading causes of death among pediatric patients under the age of 5, making timely and accurate diagnosis crucial for subsequent treatment. While numerous artificial intelligence diagnostic methods for pneumonia have shown reliable results, most of them still exhibit significant limitations: (a) They only rely on single-modality data, which have limitations in capturing comprehensive clinical information; (b) They do not consider differences in postures among pediatric patients, resulting in constrained diagnostic performance and generalizability. To address these challenges, we construct a real-world pediatric CAP dataset from tertiary hospital records, and develop a multimodal framework for the precise diagnosis of pediatric CAP. In order to simulate clinical diagnostic workflows, the developed model not only integrates frontal chest X-ray images but also considers laboratory test results and clinical texts, enabling comprehensive diagnosis of diverse symptoms, enhancing diagnostic accuracy and generalizability. Experimental results based on the constructed dataset demonstrate that our multimodal approach achieves an impressive diagnostic result, with an accuracy of 94.2%, showing significant improvement over single-modality baselines and validating the potential as an auxiliary diagnostic tool to enhance the clinical practice for pediatric CAP screening.