Abstract
The conservation and breeding of the western honey bee (Apis mellifera) is central dependent on accurate subspecies assignment, but the most commonly used methods are labor-intensive classical morphometrics and costly molecular assays. We developed an XGBoost-based classification framework using a compact set of routinely measurable characters. A curated dataset of labeled workers was measured under harmonized protocols; features were screened according to embedded importance, and model performance was assessed using five-fold cross-validation, outperforming standard machine learning baselines. The resulting model using only the top 10 characters-primarily forewing venation angles and abdominal plate metrics-achieved high performance (accuracy = 0.98; F1 = 0.99) and an area under the receiver operating characteristic curve (AUC) of 0.99 (95% CI = 0.995-0.999). SHAP analyses confirmed the discriminatory contributions of these features, while error inspection suggested that misclassifications were concentrated in morphologically overlapping lineages. The model's performance supports its use as a rapid triage tool alongside genetic testing, providing a scalable and interpretable tool for researchers to create and deploy custom morphometric models, demonstrated here for A. mellifera but portable to other insect taxa.