Abstract
OBJECTIVES: Thalassemia trait (TT) screening in resource-limited settings is hampered by reliance on expensive and complex tests. This study aimed to develop and validate a highly accessible machine learning-based tool using only routine blood parameters to accurately differentiate TT from non-TT and its major subtypes. METHODS: The retrospective study included 987 individuals (221 α-TT, 211 β-TT and 555 non-TT) from two medical centers. Seven machine learning methods-Logistic Regression, Gaussian Naive Bayes, Decision Tree, Random Forest, Multilayer Perceptron, XGBoost, and CatBoost-were employed to develop diagnostic models, which were evaluated using accuracy, sensitivity, specificity, AUC, PPV, NPV, and F1 score. RESULTS: The CatBoost model emerged as superior for differentiating TT from non-TT, achieving an AUC of 0.976, accuracy of 0.940, and specificity of 0.981. It also outperformed other models in distinguishing α-TT from β-TT (AUC = 0.842). Critically, this high-performance model was successfully deployed as a user-friendly WeChat mini-program AI Lab, for real-world clinical application. CONCLUSION: The deployed ML-based AI Lab represents a robust, interpretable, and scalable tool poised to enhance TT screening efficiency and accessibility, particularly in underserved healthcare environments.