Abstract
BACKGROUND: Deep learning algorithm-based artificial intelligence (AI) has significantly advanced the domain of endoscopic diagnosis; however, its utilization for detecting Helicobacter pylori (H. pylori) infections remains constrained. We aimed to develop and validate the AI diagnostic system (HOPE AI) for diagnosing H. pylori infection by analyzing extensive imaging data obtained from clinical endoscopies. METHODS: This multicenter diagnostic study was carried out across seven hospitals in China. Eligible patients were individuals aged 18 years or older who underwent upper gastrointestinal gastroendoscopy. The endoscopic images were randomly allocated (7:3) to the training and internal validation datasets for the development of HOPE AI, utilizing a multi-instance learning (MIL) framework and long short-term memory (LSTM) architectures, and the prospective external validation dataset for assessing its diagnostic efficacy. The performance of HOPE AI was also benchmarked against endoscopists. The diagnostic accuracy, sensitivity, specificity, and area under the curve of HOPE AI were assessed to detect H. pylori infection. RESULTS: A total of 308,887 endoscopic images and 197 videos from 6207 patients were utilized to develop and evaluate HOPE AI. Our AI system demonstrated outstanding performance, achieving an AUC of 0.932 (95% confidence interval (CI) 0.906-0.956) in the internal validation set, 0.903 (0.883-0.922) in the external temporal validation set, 0.923 (0.875-0.961) in the external temporal validation video set, and ranging from 0.855 (0.813-0.894) to 0.971 (0.955-0.985) across seven external geographical validation sets. The diagnostic sensitivity of HOPE AI (85.7%) significantly surpassed that of senior endoscopists (68.0%). CONCLUSIONS: HOPE AI exhibited robust diagnostic efficacy and interpretability in H. pylori detection, thereby enhancing the efficiency of diagnosis in routine screening contexts. TRIAL REGISTRATION: Chinese Clinical Trial Registry: ChiCTR 2400091317, 2,400,091,720.