Abstract
BACKGROUND: Ankylosing spondylitis often presents with nonspecific symptoms, making the identification of high-risk individuals challenging in clinical practice. OBJECTIVE: This study aimed to utilize blood cell indices to construct interpretable machine learning models to assist in clinical triage and the identification of patients at high risk for ankylosing spondylitis. METHODS: A retrospective case-control study was conducted involving 17,504 participants, comprising 4903 patients with ankylosing spondylitis and 12,601 controls with low back pain. Recursive feature elimination was applied to identify key variables, and six machine learning models were developed to diagnose ankylosing spondylitis using blood cell indices. The best-performing model was identified and compared with established biomarkers through receiver operating characteristic curve analysis. External validation was carried out using data from the Fourth People's Hospital of Nanning. The SHapley Additive Explanations method was applied to interpret the model and evaluate the contribution of individual indices to diagnostic predictions. In addition, to examine the independent associations between blood cell indices and ankylosing spondylitis risk while minimizing selection bias, propensity score matching was conducted, followed by binary logistic regression on the matched cohort. RESULTS: Among the diagnostic models, the light gradient boosting machine model demonstrated the best performance, with areas under the curve of 0.866 in the test set and 0.872 in the external validation set. Several blood cell indices showed significant associations with ankylosing spondylitis. CONCLUSION: The light gradient boosting machine model exhibited reliable diagnostic performance for ankylosing spondylitis, and interpretable machine learning approaches provided insights into the contributions of specific hematologic parameters. These findings suggest that blood cell indices, as inexpensive and widely available markers, may serve as a tool for clinical triage and prioritizing high-risk individuals for further diagnostic evaluation.