Abstract
Colorectal cancer (CRC) is a significant global health burden characterized by prolonged asymptomatic progression and high mortality. CRC curability improves with early-stage detection, and removing precancerous adenomas allows for prevention, emphasizing the significance of screening. This prospective study, conducted between 2024 and 2025 with 100 randomly recruited participants, investigates eNose-based analysis of volatile organic compounds (VOCs) in biological matrices for CRC diagnosis using both unsupervised and supervised machine learning (ML) techniques. After detailed medical examinations, laboratory tests, and colonoscopy, 50 patients with confirmed stage III CRC and 50 healthy controls agreed to have their blood, urine, and stool samples analyzed by the eNose technique. Principal component analysis (PCA), logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), and gradient boosting (GB) were used to analyze eNose VOC patterns in all biological matrices. Clinical and hematological alterations in CRC patients were consistent with systemic malignancy, including reduced weight, mild anemia, leukopenia, thrombocytopenia, and hypoalbuminemia, all of which are established indicators of disease severity and prognostic markers. Elevated VOC responses in CRC patients across all matrices, with blood and stool proving most informative due to favorable signal-to-noise ratios. Ensemble- and proximity-based models GB and KNN were found to be superior to LR classifiers, with GB exhibiting balanced and adaptable performance across different biological matrices. Limiting the study to stage III CRC patients improved VOC signal clarity but limited early-stage generalizability, a constraint effectively mitigated by Gaussian augmentation, which enriched data variability and boosted model performance for screening applications. Thus, eNose-based ML systems provide a globally accessible, innovative, non-invasive, and affordable solution for CRC detection, combining high sensitivity and specificity to support widespread early diagnosis.