Abstract
This study evaluates eight machine-learning regression models for estimating serum vitamin D level as a support tool for vitamin D deficiency assessment. A cohort dataset of 100 individuals (dataset 132) was analyzed using Support Vector Machine (SVM), Random Forest (RF), Artificial Neural Network (ANN/MLP), Linear Regression (LR), Elastic Net (EN), Ridge Regression (RR), Lasso Regression (LSR), and RANSAC Regressor (RAN). Model performance was assessed over 30 repeated runs using mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and coefficient of determination (R(2)). SVM yielded the strongest overall results (MAE = 1.841, MSE = 32.502, mean R(2) = 0.9981), followed by RF (MAE = 7.571, MSE = 197.832, mean R(2) = 0.9908). ANN showed intermediate performance (mean R(2) = 0.8538), whereas RR was the weakest model (mean R(2) = 0.4945). To address interpretability, the revised manuscript adds an explainability-oriented feature-attribute analysis derived from the correlation structure of the cohort. The strongest associations with vitamin D were gender (r = 0.64), hemoglobin (r = 0.47), age (r = 0.38), marital status (r = -0.35), and triglycerides (r = 0.32). These findings show that model choice substantially affects predictive performance and that nonlinear models, particularly SVM and RF, can support cost-conscious screening strategies for vitamin D deficiency assessment. Future work should validate the models on larger external cohorts and extend interpretability with model-specific explainability techniques.