Abstract
PURPOSE: The growing usage of medical imaging for diagnosis and clinical processes provides an increasing amount of materials that can be reused for secondary use. However, this valuable resource often remains underutilized due to non-standardized formatting and annotation. Our study aims to devise a validated annotation model for standardizing and facilitating the reuse of medical images based on real clinical data. METHODS: We extract a dataset with 20k DICOM X-ray images, routinely captured as standard clinical care and stored in the PACS system. A radiologist iteratively annotates and validates 1) examined body parts (single-label pathological classification) and 2) visible body parts (multi-label classification) using 36 relevant SNOMED CT codes. RESULTS: The proposed model shows an accuracy of 0.889 for classifying examined body parts and 0.853 for classifying visible body parts on the curated dataset. The approach demonstrated advantages in simplicity of use, universal availability, and the ability to enhance data quality. Reducing body parts from 116 distinct DICOM header entries to 36 SNOMED CT codes promises improved retrieval and more concise communication in future applications. In addition, the intersection of Deep Learning models and initial DICOM headers achieved the best result, with a recall of 98.7% in our simulated use case. CONCLUSION: Deep learning techniques show potential to address data standardization and quality issues, offering a technically feasible and cost-effective solution for annotating and reusing diverse medical images. Future work should enhance accuracy via multi-radiologist validation and explore methods such as unsupervised or online learning.