Abstract
This study presents the development and application of an optimized Detection Transformer (DETR) model, known as CD-DETR, for the detection of thoracic diseases from chest X-ray (CXR) images. The CD-DETR model addresses the challenges of detecting minor pathologies in CXRs, particularly in regions with uneven medical resource distribution. In the central and western regions of China, due to a shortage of radiologists, CXRs from township hospitals are concentrated in central hospitals for diagnosis. This requires processing a large number of CXRs in a short period of time to obtain results. The model integrates a multi-scale feature fusion approach, leveraging Efficient Channel Attention (ECA-Net) and Spatial Attention Upsampling (SAU) to enhance feature representation and improve detection accuracy. It also introduces a dedicated Chest Diseases Intersection over Union (CDIoU) loss function to optimize the detection of small targets and reduce class imbalance. Experimental results on the NIH Chest X-ray dataset demonstrate that CD-DETR achieves a precision of 88.3% and recall of 86.6%, outperforming other DETR variants by an average of 5% and CNN-based models like YOLOv7 by 6-8% in these metrics, showing its potential for practical application in medical imaging diagnostics.