Abstract
SHapley Additive exPlanations (SHAP) analysis has been applied in disease diagnosis and treatment effect evaluation. However, its application in the prediction and diagnosis of dairy cow diseases remains limited. We investigated whether the variance and autocorrelation of deviations in daily activity, rumination time, and milk electrical conductivity, along with daily milk yield, could be used to predict clinical mastitis in dairy cows using popular machine learning (ML) algorithms and identifying key predictive features using SHAP analysis. Quantile regression (QR) with second- or third-order polynomial models with the median or upper quantiles was used to process raw data from mastitic and healthy cows. Nine variables from the 14-day period preceding mastitis onset were identified as significantly associated with mastitis through logistic regression. These variables were used to train and validate prediction models using eleven classical ML algorithms. Among them, the partial least squares model demonstrated superior performance, achieving an AUC of 0.789, sensitivity of 0.500, specificity of 0.947, accuracy of 0.793, precision of 0.833, and F1-score of 0.625. SHAP analysis results revealed positive contributions of three features to mastitis prediction, whereas two features had negative contributions. These findings provide a theoretical basis for developing clinical decision-support tools in commercial farming settings.