Machine learning approach for the detection of vitamin D level: a comparative study

利用机器学习方法检测维生素D水平:一项比较研究

阅读:1

Abstract

BACKGROUND: After the World Health Organization declared the COVID-19 pandemic, the role of Vitamin D has become even more critical for people worldwide. The most accurate way to define vitamin D level is 25-hydroxy vitamin D(25-OH-D) blood test. However, this blood test is not always feasible. Most data sets used in health science research usually contain highly correlated features, which is referred to as multicollinearity problem. This problem can lead to misleading results and overfitting problems in the ML training process. Therefore, the proposed study aims to determine a clinically acceptable ML model for the detection of the vitamin D status of the North Cyprus adult participants accurately, without the need to determine 25-OH-D level, taking into account the multicollinearity problem. METHOD: The study was conducted with 481 observations who applied voluntarily to Internal Medicine Department at NEU Hospital. The classification performance of four conventional supervised ML models, namely, Ordinal logistic regression(OLR), Elastic-net ordinal regression(ENOR), Support Vector Machine(SVM), and Random Forest (RF) was compared. The comparative analysis is performed regarding the model's sensitivity to the participant's metabolic syndrome(MtS)'positive status, hyper-parameter tuning, sensitivities to the size of training data, and the classification performance of the models. RESULTS: Due to the presence of multicollinearity, the findings showed that the performance of the SVM(RBF) is obviously negatively affected when the test is examined. Moreover, it can be obviously detected that RF is more robust than other models when the variations in the size of training data are examined. This experiment's result showed that the selected RF and ENOR showed better performances than the other two models when the size of training samples was reduced. Since the multicollinearity is more severe in the small samples, it can be concluded that RF and ENOR are not affected by the presence of the multicollinearity problem. The comparative analysis revealed that the RF classifier performed better and was more robust than the other proposed models in terms of accuracy (0.94), specificity (0.96), sensitivity or recall (0.94), precision (0.95), F1-score (0.95), and Cohen's kappa (0.90). CONCLUSION: It is evident that the RF achieved better than the SVM(RBF), ENOR, and OLR. These comparison findings will be applied to develop a Vitamin D level intelligent detection system for being used in routine clinical, biochemical tests, and lifestyle characteristics of individuals to decrease the cost and time of vitamin D level detection.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。