Abstract
BACKGROUND: Otitis media remains a significant global health concern, particularly in resource-limited settings where timely diagnosis is challenging. Artificial intelligence (AI) offers promising solutions to enhance diagnostic accuracy in mobile health applications. OBJECTIVE: This study introduces a hybrid AI framework that integrates convolutional neural networks (CNNs) for image classification with large language models (LLMs) for clinical reasoning, enabling real-time otoscopic diagnosis. METHODS: We developed a dual-path system combining CNN-based feature extraction with LLM-supported interpretation. The framework was optimized for mobile deployment, with lightweight models operating on-device and advanced reasoning performed via secure cloud APIs. A dataset of 10,465 otoendoscopic images (expanded from 2820 original clinical images through data augmentation) across 10 middle-ear conditions was used for training and validation. Diagnostic performance was benchmarked against clinicians of varying expertise. RESULTS: The hybrid CNN-LLM system achieved an overall diagnostic accuracy of 97.6%, demonstrating the synergistic benefit of combining CNN-driven visual analysis with LLM-based clinical reasoning. The system delivered sub-200 ms feedback and achieved specialist-level performance in identifying common ear pathologies. CONCLUSIONS: This hybrid AI framework substantially improves diagnostic precision and responsiveness in otoscopic evaluation. Its mobile-friendly design supports scalable deployment in telemedicine and primary care, offering a practical solution to enhance ear disease diagnosis in underserved regions.