Abstract
Background/Objectives: Artificial intelligence (AI)-assisted endoscopy has shown high sensitivity for early gastric cancer detection; however, false-positive diagnoses remain a clinical challenge. This study aimed to evaluate the real-world diagnostic performance of a commercially available AI system and to identify factors associated with false-positive diagnoses, focusing on repeated AI evaluations and confidence stratification. Methods: This single-center retrospective study included 47 patients with 89 localized gastric lesions evaluated between March 2024 and March 2025. Endoscopic examinations were performed under white-light, non-magnified observation with repeated AI assessments of each lesion. The rates of "Consider biopsy" (B) judgments were calculated. Lesions with a B judgment rate of ≥50% were defined as AI-positive and classified into four AI confidence categories. Diagnostic performance was assessed using sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Factors associated with false-positive diagnoses were analyzed using penalized logistic regression. Results: The AI system demonstrated a sensitivity of 97.6% and an NPV of 95.7%, with a specificity of 45.8%. Pathology-positive rates decreased stepwise across the four AI confidence categories (p < 0.001). Among AI-positive lesions, low regional reproducibility, lesion size ≥ 30 mm, scar, and erosion were independently associated with false-positive diagnoses. In analyses restricted to non-neoplastic lesions, lesion size ≥ 30 mm remained significantly associated with false-positive diagnosis. Conclusions: In real-world clinical practice, a commercially available AI system provides high sensitivity for early gastric cancer detection. Incorporating confidence stratification and regional reproducibility into clinical decision-making may enhance the effective use of AI-assisted endoscopic diagnosis beyond binary interpretations.