An efficient approach on risk factor prediction related to cardiovascular disease around Kumbakonam, Tamil Nadu, India, using unsupervised machine learning techniques

一种利用无监督机器学习技术对印度泰米尔纳德邦贡伯戈讷姆地区心血管疾病相关风险因素进行高效预测的方法

阅读:1

Abstract

Nowadays, human beings suffer from varieties of diseases due to the environmental circumstances and their residing habits. Cardiovascular diseases (CVD) are the leading cause of mortality among all diseases. CVDs are heart-related diseases. In early days, the lack of technological advancements resulted in the loss of many human lives. That is, delay in diagnosis resulted in delay in treatments, which obviously becomes the reason for loss of human lives. Hence, the prediction of diseases in advance becomes an inevitability that subsequently supports in providing the necessary treatments. Thus, the present paper deals with the risk factor prediction based on unsupervised learning methods and also identifying the predominant parameters that are vital to risk factors by using principal component analysis. In this article, we have collected the patient data of size 130 × 12 from four different laboratories in and around Kumbakonam, Tamil Nadu, and India. Here, various clustering techniques like k-means clustering, partition around medoids (PAM) clustering, hierarchical clustering, and fuzzy clustering have been applied to the patient data, and the results show that data can be taken in clusters of "patients with risk" and "patients with no risk". The optimal number of clusters is determined using elbow and silhouette methods. The efficiency of the clustering is evaluated using the Hopkins statistic, Dunn's index, and average Silhouette widths. The agglomerative coefficients computed indicate that there is a strong cluster structure in the dataset. The stability of the clusters is tested using bootstrapping cluster analysis, and the result showed that the clusters are highly stable. We have applied feature selection using principal component analysis. Also, on applying PCA, out of 12 parameters, it is inferred that Total Cholesterol is the highly correlated factor which plays an important role in the identification of risk factors among patients.

特别声明

1、本页面内容包含部分的内容是基于公开信息的合理引用;引用内容仅为补充信息,不代表本站立场。

2、若认为本页面引用内容涉及侵权,请及时与本站联系,我们将第一时间处理。

3、其他媒体/个人如需使用本页面原创内容,需注明“来源:[生知库]”并获得授权;使用引用内容的,需自行联系原作者获得许可。

4、投稿及合作请联系:info@biocloudy.com。