Abstract
BACKGROUND: Colon cancer remains a leading cause of cancer-related mortality globally, highlighting the urgent need for advanced diagnostic methods to improve early detection and patient outcomes. METHODS: This study introduces ColoViT, a hybrid diagnostic framework that synergistically integrates EfficientNet and Vision Transformers. EfficientNet contributes scalability and high performance in feature extraction, while Vision Transformers effectively capture the global contextual information within colonoscopic images. RESULTS: The integration of these models enables ColoViT to deliver precise and comprehensive image analysis, significantly improving the detection of precancerous lesions and early-stage colon cancers. The proposed model achieved a recall of 92.4%, precision of 98.9%, F1-score of 98.4%, and an AUC of 99% in our preliminary evaluation. CONCLUSION: ColoViT demonstrates superior performance over existing models, offering a robust solution for enhancing the early detection of colon cancer through deep learning-based image analysis.