Abstract
INTRODUCTION: Schizophrenia is a severe mental disorder affecting approximately 1% of the general population, diagnosed primarily using clinical criteria. Due to the lack of objective diagnostic methods and reliable biomarkers, accurate diagnosis and effective treatment remain challenging. Peripheral blood biomarkers have recently attracted attention, and machine learning methods offer promising analytical capabilities to enhance diagnostic accuracy. METHODS: This retrospective, case-control study included 203 schizophrenia patients treated over a five-year period at a tertiary hospital and 192 age- and sex-matched healthy controls. Demographic data and routine hematological and biochemical parameters were extracted from medical records. Variables missing more than 85% of data were excluded; remaining missing values were imputed after train-test splitting to avoid data leakage. Optimal biomarker subsets were selected using Grey Wolf Optimization (GWO). Random Forest (RF), XGBoost, Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Logistic Regression (LR) models were trained and evaluated via stratified 10-fold cross-validation. RESULTS: Groups were homogeneous in terms of age and sex. Before GWO optimization, XGBoost (95.55%) and Random Forest (94.63%) yielded the highest accuracies. Following optimization, Random Forest accuracy improved (94.95%) with a recall of 96.25%, while XGBoost reached the highest accuracy (95.90%) and strong specificity (95.54%). Post-optimization, Area Under the Curve (AUC) values were highest for XGBoost (0.96) and Random Forest (0.95), indicating strong diagnostic performance. Total protein, glucose, iron, creatine kinase, total bilirubin, uric acid, calcium, and sodium were key biomarkers distinguishing schizophrenia. Interestingly, glucose levels were significantly lower in schizophrenia patients compared to controls, contrary to typical findings. Differences in triglycerides, liver enzymes, sodium, and potassium lacked clear clinical significance. DISCUSSION: The machine learning models developed provided diagnostic accuracy comparable to studies utilizing more expensive biomarkers, highlighting potential clinical and economic advantages. External validation is recommended to further confirm the generalizability and clinical utility of these findings.